Python - cql - Cassandra 1.2 - rpc read timeouts

I have a Python application using a Cassandra 1.2 cluster. The cluster has 7 physical nodes using virtual nodes, and a replication factor of 3 for 1 of the keys and a replication factor of 1 for the other. The application uses the cql library to connect to Cassandra and execute queries. The problem is that I started getting errors when trying to fetch in the database, and I get this error:

Request did not complete within rpc_timeout 

When I check the status of the cluster, I see one of my nodes using a processor of more than 100% and checking the Cassandra system.log. I see this pop up all the time:

  INFO [ScheduledTasks:1] 2013-06-07 02:02:01,640 StorageService.java (line 3565) Unable to reduce heap usage since there are no dirty column families INFO [ScheduledTasks:1] 2013-06-07 02:02:02,642 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 630 ms for 1 collections, 948849672 used; max is 958398464 WARN [ScheduledTasks:1] 2013-06-07 02:02:02,643 GCInspector.java (line 142) Heap is 0.9900367202591844 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2013-06-07 02:02:02,685 StorageService.java (line 3565) Unable to reduce heap usage since there are no dirty column families INFO [ScheduledTasks:1] 2013-06-07 02:02:04,224 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1222 ms for 2 collections, 931216176 used; max is 958398464 WARN [ScheduledTasks:1] 2013-06-07 02:02:04,224 GCInspector.java (line 142) Heap is 0.9716378009554072 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2013-06-07 02:02:04,225 StorageService.java (line 3565) Unable to reduce heap usage since there are no dirty column families INFO [ScheduledTasks:1] 2013-06-07 02:02:05,226 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 709 ms for 1 collections, 942735576 used; max is 958398464 WARN [ScheduledTasks:1] 2013-06-07 02:02:05,227 GCInspector.java (line 142) Heap is 0.9836572275641711 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2013-06-07 02:02:05,229 StorageService.java (line 3565) Unable to reduce heap usage since there are no dirty column families INFO [ScheduledTasks:1] 2013-06-07 02:02:06,946 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 1271 ms for 2 collections, 939532792 used; max is 958398464 WARN [ScheduledTasks:1] 2013-06-07 02:02:06,946 GCInspector.java (line 142) Heap is 0.980315419203343 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically 

Any ideas on how to solve this?

Thanks in advance!

+4
source share
2 answers

It seems that the heap size of the Cassandra JVM may be too small, only 1Gb:

 max is 958398464 

I would suggest increasing the heap by at least 2 GB if you have memory on your nodes.

See cassandra-env.sh to find out how the JVM heap distribution is calculated, or to manually set it to a specific value.

+2
source

What type of delimiter do you use and what is your data schema? how many records do you have and how many records should your query return? these are all the parameters we need to know in order to find the right answer to your question.

I case of Cassandra, the structure of the data structure is very important, Cassandra is not like RDBMS databases, you can easily create indexes for each column you want, Cassandra column families should be defined in such a way that evenly distribute data between cluster nodes to avoid hot points or reading data from only one node cluster, which I think may be the reason for the rpc timeout in your case.

If you need more information, please send additional information. Thanks

Hope this helps you.

+1
source

Source: https://habr.com/ru/post/1484947/


All Articles