Capsandra keypace does not apply to recently added node (after previous successful additions and paragraphs)

Question

Capsandra keypace does not apply to recently added node (after previous successful additions and paragraphs)

I tested rotation through a 4 node cluster by adding and removing nodes in a circular manner so that the cluster members adhered to the following repeating sequence

1 2 3 2 3 2 3 4 3 4 1 3 4 1 4 1 2 4 1 2 1 2 3 2 3 2 3 4 3 4 1 3 4 1 4 ...

Node addition was done by stopping cassandra, clearing /var/lib/cassandra/* and restarting cassandra (with the same cassandra.yaml file that listed nodes 1 and 2 as seeds). Removing the node was done by stopping cassandra, and then issuing the nodetool removenode $nodeId from another node. In all cases, the next operation did not start until the previous one was completed.

The above sequence of node members was repeated several times, while after 4 iterations I performed the "add node" operation to move from the cluster of nodes {1, 2} to the cluster of nodes {1, 2, 3}. At this iteration, my custom keyspace did not extend to node 3. The Nodetool state looked great:

 $ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.12.206 164.88 KB 256 66.2% 7018ef8a-af08-40e9-b3d3-065f4ba6eb0d rack1 UN 192.168.12.207 60.85 KB 256 63.2% ff18b636-6287-4c70-bf23-0a1a1814b864 rack1 UN 192.168.12.205 217.19 KB 256 70.6% 2bc38fa8-42a1-457f-84d7-35b3b46e1daa rack1

But cqlsh on node 3 did not know about my key space. I tried to start nodetool repair , which seemed to be infinitely closed, while spewing the following pair of stacks in the log:

 WARN [Thread-9781] 2014-09-16 19:34:30,081 IncomingTcpConnection.java (line 83) UnknownColumnFamilyException reading from socket; closing org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=08768b1d-97a1-3528-8191-9acee7b08ef4 at org.apache.cassandra.db.ColumnFamilySerializer.deserializeCfId(ColumnFamilySerializer.java:178) at org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:103) at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:145) at org.apache.cassandra.service.paxos.Commit$CommitSerializer.deserialize(Commit.java:134) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74) ERROR [Thread-9782] 2014-09-16 19:34:31,484 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-9782,5,main] java.lang.NullPointerException at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:247) at org.apache.cassandra.db.RangeSliceCommandSerializer.deserialize(RangeSliceCommand.java:156) at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:153) at org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:130) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:74)

Any ideas what is happening and how to fix it (ideally, a reliable working repair and a way to avoid entering this condition in the first place)?

+5

cassandra

jonderry Sep 17 '14 at 2:36

source share

1 answer

phact · Answer 1 · 2014-12-23T22:49:20+0000

If there is a disagreement on the schema version, you can tell by running nodetool describecluster

If you see different versions in the same node, run the following node, which have the wrong version:

stop the Cassandra service / process, typically by running: nodetool drain

sudo service cassandra stop or kill <pid> . At the end of this process, the commit log directory (/ var / lib / cassandra / commitlog) should contain only one small file.

Reset the sshables * and Migration schemes inside your system keyspace (/ var / lib / cassandra / data / system if you use the default settings).

After starting Cassandra, this node will notice the missing information and pull the correct circuit from one of the other nodes. In version 1.0.X and before the circuit applies one mutation at a time. As long as it is applied, node can log messages, such as the ones below, that the column family cannot be found. These messages can be ignored.

 ERROR [MutationStage:1] 2012-05-18 16:23:15,664 RowMutationVerbHandler.java (line 61) Error in row mutation org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find cfId=1012 To confirm everything is on the same schema, verify that 'describe cluster;' only returns one schema version.

Source: https://wiki.apache.org/cassandra/FAQ

Capsandra keypace does not apply to recently added node (after previous successful additions and paragraphs)

More articles: