I am trying to suggest a feature in which I can display the pages most viewed by friends. My friends table has 5.7 M rows, and the views table has 5.3 M rows. For now, I just want to run a query in these two tables and find the 20 most viewed page identifiers by a person.
Here is the query that I have now:
SELECT page_id FROM `views` INNER JOIN `friendships` ON friendships.receiver_id = views.user_id WHERE (`friendships`.`creator_id` = 143416) GROUP BY page_id ORDER BY count(views.user_id) desc LIMIT 20
And here is the explanation:
+----+-------------+-------------+------+-----------------------------------------+---------------------------------+---------+-----------------------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+-----------------------------------------+---------------------------------+---------+-----------------------------------------+------+----------------------------------------------+ | 1 | SIMPLE | friendships | ref | PRIMARY,index_friendships_on_creator_id | index_friendships_on_creator_id | 4 | const | 271 | Using index; Using temporary; Using filesort | | 1 | SIMPLE | views | ref | PRIMARY | PRIMARY | 4 | friendships.receiver_id | 11 | Using index | +----+-------------+-------------+------+-----------------------------------------+---------------------------------+---------+-----------------------------------------+------+----------------------------------------------+
There is a primary key in the views table (user_id, page_id), and you can see that it is being used. The friendship table has a primary key (receiver_id, creator_id) and a secondary index (creator_id).
If I run this query without grouping and limitation, there will be about 25,000 lines for this particular user. This is typical.
In the most recent real run, this request took 7 seconds, and it is too long for a decent response in a web application.
One thing I'm curious about is setting up a secondary index (creator_id, receiver_id). I'm not sure if this will give much of the performance. I will probably try this today depending on the answers to this question.
Can you see how you can rewrite a request to make it quickly lit?
Update: I need to do more tests on it, but it seems that my nasty request works better if I don't do grouping and sorting in db, but do it in ruby afterwards. The total time is much shorter - about 80% seems. My early testing may have been erroneous, but it definitely needs more investigation. If this is true - then wtf is Mysql?