How to optimize redis cli script to handle 50 million keys

Question

How to optimize redis cli script to handle 50 million keys

I wrote below a bash script to handle the key and redis value. I have about 45-50 million keys in my Redis. I want to get all the values and do some processing. For this, my script below takes 1 hour to process 1 million keys. It takes 50 hours to process 50 million keys, and I don't want to do this. I am new to redis cli - can someone please help me optimize the below script, or it would be very helpful if someone can offer some suggestion.

My Redis Key Value Modifier:

Keys - 123.item.media
Values - 93839,abc,98,829 | 38282,yiw,282,282 | 8922,dux,382,993 |

Keys - 234.item.media
Values - 2122,eww,92,211 | 8332,uei,902,872 | 9039,uns,892,782 |

Keys - 839.item.media
Values - 7822,nkp,77,002 | 7821,mko,999,822 |

Below the script, I pass all my keys and calculate how many records I have for each key. For example, this key (123.item.media) has 3 entries, and this (839.item.media) has two entries.

So, for keys and bove values, the output should be: Total ratings: 8

The same thing that I do for all 50 millionth keys - it takes too much time.

My code is:

#!/bin/sh
cursor=-1
keys=""
recordCount=0
while [ $cursor -ne 0 ];
do
        if [ $cursor -eq -1 ]
        then
        cursor=0
    fi
    reply=`redis-cli SCAN $cursor MATCH "*" COUNT 100`
    #echo $reply
    cursor=`expr "$reply" : '\([0-9]*[0-9 ]\)'`
    keys=${reply#[0-9]*[[:space:]]}
    for i in $keys
    do
    #echo $i
    #echo $keys
    value=$(redis-cli GET $i)
    temCount=`echo $value | awk -F\| '{print NF}'`
    #echo $temCount
    recordCount=`expr ${temCount} + ${recordCount}`
    done
done

echo "Total Count: " $recordCount

Rate your help in advance!

+4

bash redis redis-cli query-performance

learn java Oct 28 '17 at 15:58

source share

2 answers

codeforester · Answer 1 · 2017-10-28T18:55:05+0000

You write too many times in a loop, even for simple things, such as arithmetic, which can be done using the built-in Bash functions. When you have things like this in a loop that runs several million times, it will slow things down. For instance:

cursor=$(expr "$reply" : '$[0-9]*[0-9 ]$')
temCount=$(echo $value | awk -F\| '{print NF}')
recordCount=$(expr ${temCount} + ${recordCount})

I am not a redis expert. Based on my cursory understanding of redis-cli, you can do this:

redis-cli --scan | sort -u > all.keys
while read -r key; 
  value=$(redis-cli get "$key")
  # do your processing
done < all.keys

, all.keys . , mget , , .

, Bash . , Python Ruby.

DhruvPathak · Answer 2 · 2017-11-02T08:42:06+0000

50 50 :

value=$(redis-cli GET $i)

, GET 1000 --pipe.

  --pipe             Transfer raw Redis protocol from stdin to server.
  --pipe-timeout <n> In --pipe mode, abort with error if after sending all data.
                     no reply is received within <n> seconds.

redis, .

, , script 50 . 1000 10000 100000, , .

How to optimize redis cli script to handle 50 million keys

More articles: