How to increase Redis performance with a 100% processor? Sharding? Fastest .Net client?

Due to the large increase in load on our site, redis is now struggling with the maximum load, because the redis server instance reaches 100% of the CPU (on one of the eight cores), which leads to a timeout.

We upgraded our client software to ServiceStack V3 (starting with BookSleeve 1.1.0.4) and upgraded the redis server to 2.8.11 (starting from version 2.4.x). I chose ServiceStack because of the existence of Harbor.RedisSessionStateStore , which uses ServiceStack.Redis. We used to use AngiesList.Redis with BookSleeve, but we also tested 100%.

We have eight redis servers configured as a master / slave tree. One server for tho session state. Others are for the data cache. One master with two master / slave devices connected to two slaves each.

Servers contain about 600 client connections at their peak when they begin to become clogged with 100% CPU.

What can we do to increase productivity?

Sharding and / or StackExchange Redis client (without my state).

Or could it be something else? The session server also reaches 100%, and it is not connected to any other servers (data and network bandwidth is low).


Update 1: Analyzing redis-cli INFO information

The INFO command is displayed here after one night of launching Redis 2.8.

# Server redis_version:2.8.11 redis_git_sha1:00000000 redis_git_dirty:0 redis_build_id:7a57b118eb75b37f redis_mode:standalone os:Linux 2.6.32-431.11.2.el6.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll gcc_version:4.4.7 process_id:5843 run_id:d5bb838857d61a9673e36e5bf608fad5a588ac5c tcp_port:6379 uptime_in_seconds:152778 uptime_in_days:1 hz:10 lru_clock:10765770 config_file:/etc/redis/6379.conf # Clients connected_clients:299 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 # Memory used_memory:80266784 used_memory_human:76.55M used_memory_rss:80719872 used_memory_peak:1079667208 used_memory_peak_human:1.01G used_memory_lua:33792 mem_fragmentation_ratio:1.01 mem_allocator:jemalloc-3.2.0 # Persistence loading:0 rdb_changes_since_last_save:70245 rdb_bgsave_in_progress:0 rdb_last_save_time:1403274022 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:0 rdb_current_bgsave_time_sec:-1 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok # Stats total_connections_received:3375 total_commands_processed:30975281 instantaneous_ops_per_sec:163 rejected_connections:0 sync_full:10 sync_partial_ok:0 sync_partial_err:5 expired_keys:8059370 evicted_keys:0 keyspace_hits:97513 keyspace_misses:46044 pubsub_channels:2 pubsub_patterns:0 latest_fork_usec:22040 # Replication role:master connected_slaves:2 slave0:ip=xxx.xxx.xxx.xxx,port=6379,state=online,offset=272643782764,lag=1 slave1:ip=xxx.xxx.xxx.xxx,port=6379,state=online,offset=272643784216,lag=1 master_repl_offset:272643811961 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:272642763386 repl_backlog_histlen:1048576 # CPU used_cpu_sys:20774.19 used_cpu_user:2458.50 used_cpu_sys_children:304.17 used_cpu_user_children:1446.23 # Keyspace db0:keys=77863,expires=77863,avg_ttl=3181732 db6:keys=11855,expires=11855,avg_ttl=3126767 

Update 2: twemproxy (Sharding)

I found an interesting component called twemproxy . This component, as I understand it, could trick several instances of redis.

Would it help get rid of the processor?

This will save us a lot of programming time, but it will still take some time to configure 3 additional instances on each server. Therefore, I hope that someone can confirm or debunk this decision before we begin work.

+3
source share
3 answers

We detected a problem inside our application. The message about the updated data in our cache in the local memory cache was implemented by subscribing to the redis channel.

Each time the local cache was flushed, items expired or items were updated, messages were sent to all (35) web servers, which in turn started updating more items, etc. etc.

Disabling messages for updated keys improved our situation 10 times.

The network bandwidth decreased from 1.2 Gbit / s to 200 Mbit / s, and the processor load is 40% at 150% of the load that we had so far at the time of the extreme calculations and updates.

+3
source

My first, simple suggestion, if you didn’t, would already be to disable all RDB or AOF backups on your Wizard, at least. Of course, your slaves may lag behind if they are still stored on disk. See this for an idea of ​​the cost of RDB dumps.

Another thing is to make sure that you convey all your commands. If you send a lot of commands individually, which can be grouped into a pipeline, you should see performance in performance.

Also, this SO post has a nice Redis profiling answer.

More information about your use case and data structure will be helpful in deciding whether you can simply change the way you really use Redis, which will give you an improvement.

Change In response to your last comment, it’s nice to note that every time you have a slave device, the connection is lost and reconnected, it will be re-synchronized with the wizard. In previous versions of Redis, it was always a complete re-synchronization, so it was quite expensive. Apparently, in 2.8, the slave can now request partial re-synchronization of only the data that he missed from the moment of disconnection. I do not really understand the details, but if your master or any of your slaves is not on 2.8. * And you have a shaky connection, this can greatly damage the performance of your processor, constantly forcing your master to re-synchronize slaves. Read more here

+3
source

The first thing to do is look at slowlog get 50 (or select any number of lines) - this shows the last 50 commands that took up non-trivial amounts of time. Maybe some of the things you do are just too long. I worry if I see anything in slowlog - I usually see elements every few days. If you are constantly looking at a lot of items, then: you need to find out what you are actually doing on the server. One thing that a killer never needs is keys , but there are other things.

The next thing to do: cache. Requests that are short-circuited before they reach the back are free. We use redis extensively, but that does not mean that we also ignore local memory.

+2
source

Source: https://habr.com/ru/post/971047/


All Articles