Poor Cassandra recording performance

I am new to NoSQL and Cassandra. I am experimenting with settings to access cache only. I process by reading line by line from a file of 100,000 lines and using Hector to insert into Cassandra. I notice a very low throughput of about 6,000 inserts per second. The entire write operation is about 20.5 seconds, which is unacceptable for our application. We need something like 100,000 inserts per second. I am testing a Windows 7 computer with 4 GB of RAM.

I am doing a test just for insertion.

Please let me know where I am going wrong. Please suggest how I can improve insertion per second.

Keyspace: Keyspace1 Read Count: 0 Read Latency: NaN ms. Write Count: 177042 Write Latency: 0.003106884242157228 ms. Pending Tasks: 0 Column Family: user SSTable count: 3 Space used (live): 17691 Space used (total): 17691 Number of Keys (estimate): 384 Memtable Columns Count: 100000 Memtable Data Size: 96082090 Memtable Switch Count: 1 Read Count: 0 Read Latency: NaN ms. Write Count: 177042 Write Latency: NaN ms. Pending Tasks: 0 Key cache capacity: 150000 Key cache size: 0 Key cache hit rate: NaN Row cache capacity: 150000 Row cache size: 0 Row cache hit rate: NaN Compacted row minimum size: 73 Compacted row maximum size: 924 Compacted row mean size: 784 

I tried several methods to set the string cache and key cache:

  • Via CLI Cassandra

  • Via NodeCmd: java org.apache.cassandra.tools.NodeCmd -p 7199 setcachecapacity User Keyspace1 150,000 150,000

+6
source share
2 answers

I would not describe 6000 records per second as "slow", but Cassandra can do much better. But note that Cassandra is designed for long-term recording, so it may give lower performance than memory-only caching solutions.

According to sbridges, you cannot get full performance from Cassandra with a single client. Try using multiple client threads, processes, or machines.

I do not think that you will get 100,000 records per second on one node. I only got about 20,000-25,000 records per second on modest hardware (although Cassandra is significantly faster since I did this benchmarking). It seems that 6000 per second is suitable for a single customer versus a single node product.

With a cluster of nodes, you can get 100,000 per second (see http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html for a recent test of 1,000,000 entries per second!)

The line cache and key cache are for performance reading, not for write performance.

Also, make sure you are performing packet recording (if necessary) - this will reduce network overhead.

+9
source

How many threads / processes do you use to perform inserts? Hector calls are synchronous, so if you use only one thread on the client side, this could be your bottleneck.

+8
source

Source: https://habr.com/ru/post/903196/


All Articles