Cassandra COPY not working sequentially

I tried to import a CSV with approximately 20 million lines.

I ran a pilot run with a CSV of 100 rows to check if the columns were ok and there were no analysis errors. Everything went well.

Every time I tried to import a 20 millionth CSV file, it failed after a different amount of time. On my local computer, it failed after 90 minutes with the following error. In the server field, it does not work after 10 minutes:

Processed 4050000 rows; Write: 624.27 rows/ss
code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info=

{'received_responses': 0, 'required_responses': 1, 'write_type': 0, 'consistency': 1}
Aborting import at record #4050617. Previously-inserted values still present.
4050671 rows imported in 1 hour, 26 minutes, and 43.649 seconds.

Kassandra: the node coordinator fades while waiting for replicas of nodes (This is one cluster of nodes and the replication coefficient is 1, so why is it wating for other nodes is another question)

Then, based on the recommendations in another thread, I changed the recording time, although I was not sure that this was the main reason.

write_request_timeout_in_ms: 20000 

( 300000)

.

, CSV 500 000 CSV. ( 0!). 2 5 .

:

Processed 460000 rows; Write: 6060.32 rows/ss
Connection heartbeat failure
Aborting import at record #443491. Previously inserted records are still present, and some records after that may be present as well.

, - Ctrl+C

, . 5 . Cassandra 10- , .

, COPY, : 10 000 . , 80 000 . 30 , 70 000 90 000 , 30 , CSV .

. , - , , .

Cassandra 2.2.3

+4
2
+3

, COPY, , , , .

SSTable, , .

, cassandra - script, CSV, . Python .

+4

Source: https://habr.com/ru/post/1612706/


All Articles