EDIT: Real properties are used as indicated in the comments.
I think I did it (again). I used PHPRedis .
I also used sorted sets, but I inverted your schema: each zset represents an ingredient , and each recipe identifier is a member of that zset. Thus, each zset has the same number of members , that is, each application recipe. Each recipe uses , not an ingredient. This determines the rating.
Downloading is somewhat expensive, but the request is completed under 3s , for a sample with 12 ingredients and 600,000 recipes. (you have a lot of them!).
Loading
Pseudo Code:
For every ingredient i on the system
For every recipe j on the system
If recipe j uses the ingredient i Then
score = 1
INCR recipe: j : ing_count // Will help sorting
RPUSH recipe: j : ing_list i // For listing all ingredients in recipe
Else
score = 0
End if
ZADD ing : i score j
End for
End for
code:
#!/usr/bin/php <? ### Total of ingredients define('NUM_OF_ING',12); ### Total of recipes define('NUM_OF_RECIPES',600000); $redis = new \Redis(); $redis->connect('localhost'); for ($ing=1; $ing<=NUM_OF_ING; $ing++) { for ($recipe=1; $recipe<=NUM_OF_RECIPES; $recipe++) { $score = rand() % 2; if ($score == 1) { $redis->incr("recipe:$recipe:ing_count"); $redis->rpush("recipe:$recipe:ing_list", $ing); } $redis->zAdd("ing:$ing", $score, $recipe); } } echo "Done.\n"; ?>
Inquiries
Before embedding the PHP code and the measured runtime, let me make a few comments:
Sorting is based on the amount of ingredients used (the sum of zsets in the request). If two recipes use all the ingredients that are in the request , then a tie-break is performed by the number of additional ingredients that one recipe has. More ingredients, higher position.
The amount is processed by ZINTERSTORE . Zset with amounts is saved as a result .
The SORT team then looks in the counter key for each recipe, customizing the order with this additional restriction.
code:
#!/usr/bin/php <? $redis = new \Redis(); $redis->connect('localhost'); //properties in query $query = array('ing:2', 'ing:4', 'ing:5'); $weights = array(1, 1, 1); //intersection $redis->zInter('result', $query, $weights, 'sum'); //sorting echo "Result:\n"; var_dump($redis->sort('result', array('by'=>'recipe:*:ing_count', 'sort'=>'desc', 'limit'=>array(0,10)))); echo "End.\n"; ?>
Exit and Runtime:
niloct@Impulse-Ubuntu :~$ time ./final2.php Result: array(10) { [0]=> string(4) "5230" [1]=> string(5) "79549" [2]=> string(4) "2871" [3]=> string(3) "336" [4]=> string(6) "109279" [5]=> string(4) "5352" [6]=> string(5) "16868" [7]=> string(3) "690" [8]=> string(4) "3174" [9]=> string(4) "8795" } End. real 0m2.930s user 0m0.016s sys 0m0.004s niloct@Impulse-Ubuntu :~$ redis-cli lrange recipe:5230:ing_list 0 -1 1) "12" 2) "11" 3) "10" 4) "9" 5) "8" 6) "7" 7) "6" 8) "5" 9) "4" 10) "3" 11) "2" 12) "1"
Hope this helps.
PS: Can you post your performance evaluations after that?