Calculate the number of suitable properties in Redis

Question

Calculate the number of suitable properties in Redis

I would like the migration of a dataset from PostgreSQL to Redis to have a positive effect on a particular search query. Unfortunately, I do not know how to organize keys and values.

I want users to be able to provide a list of properties, and the application delivers a list of items in ascending order of properties that have not been entered.

For instance:

Item #1 { prop1, prop2, prop4, prop7 } Query: "prop1 prop3 prop4 prop5" Item #2 { prop7, prop8 } Result: Item #3 Item #3 { prop2, prop3, prop5 } Item #1 Item #2

What I have come up with so far:

 #!/usr/bin/python properties = (1, 3, 4, 5) items = ["Properties:%s:items" % id for property in properties ] redis.zunionstore("query:related_items", items) redis.zinterstore("query:result", { "Items:all": 1, "query:related_items": -1 })

Creates a sorted set of Item (all with a score of 1) that are related to the Property s entered by the user. Then the intersection is calculated with a sorted set of all Items (where each rating point is a number of Property s). Scales are set to create a score of 0 if all Property from Item specified in the request.

Since the number of Item is about 600,000 records, this query takes approximately 4-6 seconds. Is there a better way to do this?

+4

redis

jnns Dec 18 '11 at 14:38

source share

2 answers

Ezekiel templin · Answer 1 · 2012-01-11T20:31:59+0000

I assume you are looking for a Python solution, but the Ohm library for Ruby is my favorite Redis-based database of analogues. Given the similarities between Python and Ruby and Ohm's exclusive documentation , you might find some inspiration.

Niloct · Answer 2 · 2012-01-13T23:30:17+0000

EDIT: Real properties are used as indicated in the comments.

I think I did it (again). I used PHPRedis .

I also used sorted sets, but I inverted your schema: each zset represents an ingredient , and each recipe identifier is a member of that zset. Thus, each zset has the same number of members , that is, each application recipe. Each recipe uses , not an ingredient. This determines the rating.

Downloading is somewhat expensive, but the request is completed under 3s , for a sample with 12 ingredients and 600,000 recipes. (you have a lot of them!).

Loading

Pseudo Code:

  For every ingredient i on the system
    For every recipe j on the system
       If recipe j uses the ingredient i Then
          score = 1
          INCR recipe: j : ing_count // Will help sorting
          RPUSH recipe: j : ing_list i // For listing all ingredients in recipe
       Else
          score = 0
       End if
       ZADD ing : i score j
    End for
 End for

code:

 #!/usr/bin/php <? ### Total of ingredients define('NUM_OF_ING',12); ### Total of recipes define('NUM_OF_RECIPES',600000); $redis = new \Redis(); $redis->connect('localhost'); for ($ing=1; $ing<=NUM_OF_ING; $ing++) { for ($recipe=1; $recipe<=NUM_OF_RECIPES; $recipe++) { $score = rand() % 2; if ($score == 1) { $redis->incr("recipe:$recipe:ing_count"); $redis->rpush("recipe:$recipe:ing_list", $ing); } $redis->zAdd("ing:$ing", $score, $recipe); } } echo "Done.\n"; ?>

Inquiries

Before embedding the PHP code and the measured runtime, let me make a few comments:

Sorting is based on the amount of ingredients used (the sum of zsets in the request). If two recipes use all the ingredients that are in the request , then a tie-break is performed by the number of additional ingredients that one recipe has. More ingredients, higher position.

The amount is processed by ZINTERSTORE . Zset with amounts is saved as a result .

The SORT team then looks in the counter key for each recipe, customizing the order with this additional restriction.

code:

 #!/usr/bin/php <? $redis = new \Redis(); $redis->connect('localhost'); //properties in query $query = array('ing:2', 'ing:4', 'ing:5'); $weights = array(1, 1, 1); //intersection $redis->zInter('result', $query, $weights, 'sum'); //sorting echo "Result:\n"; var_dump($redis->sort('result', array('by'=>'recipe:*:ing_count', 'sort'=>'desc', 'limit'=>array(0,10)))); echo "End.\n"; ?>

Exit and Runtime:

 niloct@Impulse-Ubuntu :~$ time ./final2.php Result: array(10) { [0]=> string(4) "5230" [1]=> string(5) "79549" [2]=> string(4) "2871" [3]=> string(3) "336" [4]=> string(6) "109279" [5]=> string(4) "5352" [6]=> string(5) "16868" [7]=> string(3) "690" [8]=> string(4) "3174" [9]=> string(4) "8795" } End. real 0m2.930s user 0m0.016s sys 0m0.004s niloct@Impulse-Ubuntu :~$ redis-cli lrange recipe:5230:ing_list 0 -1 1) "12" 2) "11" 3) "10" 4) "9" 5) "8" 6) "7" 7) "6" 8) "5" 9) "4" 10) "3" 11) "2" 12) "1"

Hope this helps.

PS: Can you post your performance evaluations after that?

Calculate the number of suitable properties in Redis

Loading

Inquiries

More articles: