当前位置: 动力学知识库 > 问答 > 编程问答 >

datastax - Cassandra restart issues while restoring to a new cluster

问题描述:

I am restoring to a fresh new Cassandra 2.2.5 cluster consisting of 3 nodes.

Initial cluster health of the NEW cluster:

-- Address Load Tokens Owns Host ID Rack

UN 10.40.1.1 259.31 KB 256 ? d2b29b08-9eac-4733-9798-019275d66cfc uswest1adevc

UN 10.40.1.2 230.12 KB 256 ? 5484ab11-32b1-4d01-a5fe-c996a63108f1 uswest1adevc

UN 10.40.1.3 248.47 KB 256 ? bad95fe2-70c5-4a2f-b517-d7fd7a32bc45 uswest1cdevc

As part of the restore instructions in Datastax docs, i do the following on the new cluster:

1) cassandra stop on all of the three nodes one by one.

2) Edit cassandra.yaml for all of the three nodes with the backup'ed token ring information. [Step 2 from docs]

3) Remove the contents from /var/lib/cassandra/data/system/* [Step 4 from docs]

4) cassandra start on nodes 10.40.1.1, 10.40.1.2, 10.40.1.3 respectively.

Result:

10.40.1.1 restarts back successfully:

-- Address Load Tokens Owns Host ID Rack

UN 10.40.1.1 259.31 KB 256 ? 2d23add3-9eac-4733-9798-019275d125d3 uswest1adevc

But the second and the third nodes fail to restart stating:

java.lang.RuntimeException: A node with address 10.40.1.2 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

at org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:546) ~[apache-cassandra-2.2.5.jar:2.2.5]

at org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:766) ~[apache-cassandra-2.2.5.jar:2.2.5]

at org.apache.cassandra.service.StorageService.initServer(StorageService.java:693) ~[apache-cassandra-2.2.5.jar:2.2.5]

at org.apache.cassandra.service.StorageService.initServer(StorageService.java:585) ~[apache-cassandra-2.2.5.jar:2.2.5]

at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) [apache-cassandra-2.2.5.jar:2.2.5]

at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) [apache-cassandra-2.2.5.jar:2.2.5]

at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) [apache-cassandra-2.2.5.jar:2.2.5]

INFO [StorageServiceShutdownHook] 2016-08-09 18:13:21,980 Gossiper.java:1449 - Announcing shutdown


java.lang.RuntimeException: A node with address 10.40.1.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

...

Eventual cluster health:

-- Address Load Tokens Owns Host ID Rack

UN 10.40.1.1 259.31 KB 256 ? 2d23add3-9eac-4733-9798-019275d125d3 uswest1adevc

DN 10.40.1.2 230.12 KB 256 ? 6w2321ad-32b1-4d01-a5fe-c996a63108f1 uswest1adevc

DN 10.40.1.3 248.47 KB 256 ? 9et4944d-70c5-4a2f-b517-d7fd7a32bc45 uswest1cdevc


I understand that the HostID of a node might change after system dirs are removed.

My question is:

Do i need to explicitly state during the start to replace itself? Are the docs incomplete or am i missing something in my steps?

网友答案:

Turns out there were stale directories commit_log and saved_caches which i missed to delete earlier. The instructions work correctly with those directories deleted.

网友答案:

Usually on a situation like this, after i do a

$ systemctl stop cassandra

It i will run the

$ ps awxs | grep cassandra

will notice cassandra still has some features up.

I usually do a

$ kill -9 cassandra.pid

and

$ rm -rf /var/lib/cassandra/data/* && /var/lib/cassandra/commitlog/*

网友答案:
java.lang.RuntimeException: A node with address 10.40.1.3 already exists, cancelling join. Use cassandra.replace_address if you want to replace this node.

If you are still facing this above error, that means your cassandra process is running on that node. Login to 10.40.1.3 node firstly. Then follow the following steps-

$ jps

You see some processes running. For example:

9107 Jps 1112 CassandraDaemon

Then kill the CassandraDaemon process by the process id you see after executing jps. In my example, here process id 1112 for CassandraDaemon.

$ kill -9 1112

Then check processes again after a while-

$ jps

You will see CassandraDaemon will no longer be available.

9170 Jps

Then remove your saved_caches and commitlog and start cassandra again. Do this for all nodes you are suffering with above error you mentioned.

分享给朋友:
您可能感兴趣的文章:
随机阅读: