MySQL: data duplication for better performance?

Question

MySQL: data duplication for better performance?

I have a large table (200'000'000 rows); so declared

thread( forum_id tinyint, thread_id int, date_first datetime, date_last datetime replycount mediumint, extra blob )

forum_id and thread_id are the primary key. With large forums (about a million topics), I sometimes have to run queries like SELECT thread_id FROM thread ORDER BY date_last DESC LIMIT 500000, 10 . These queries with huge offsets take the second, or perhaps several, to complete.

So I could, by duplicating the data, create some tables for forums with most threads to speed it up. Only a few forums contain more than 100,000 topics, so there will be a table like

  thread_for_forumid_123456 ( thread_id int, date_first datetime, date_last datetime replycount mediumint )

What do you think about it? Will this speed up huge query biases? Do you have any other suggestions? Thanks.

+4

mysql

cedivad Dec 6 '11 at 13:27

source share

3 answers

It looks like your problem is the large offset LIMIT 500000, 10 .

I don’t understand why you have 5'000'000 here. If you are filtering thread_id and forum_id, the offset should be very small, because I do not believe that you have 5 million posts in one thread.

 SELECT thread_id FROM thread WHERE thread_id = 123456 ORDER BY date_last DESC LIMIT 10, 50

Perhaps you can also see http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html

0

PiTheNumber Dec 6 '11 at 13:38

source share

MySQL Partitioning Sounds Like Functionality You May Consider

0

Sergey Kudriavtsev Dec 6 '11 at 13:53

source share

Gonza · Accepted Answer · 2011-12-06T13:54:15+0000

First, I REALLY try to avoid your approach. I see it as a "last chance" to avoid performance issues.

You have alternatives (from hardware to software), you can buy Fusion i / o or just an SSD drive ( Raid vs SSD vs FusionIO ). But you can solve it with the help of software and not invest at all. In your scenario, you should use a cache (i.e. memcached ) if you are not already using it. MySQL has partitioning , it is not the best choice in the world, but you can achieve a good performance improvement.

BUT, if you come up with an idea, I suggest you outline your data using a value that divides your data into tables more balanced. You can do something unpleasant: create 50 tables, for example thread_0, in thread_49 And then (forumid% 50) so that you get to one of these 50 tables. This way you avoid creating a table every time you create a forum, and avoid having N tables. And the choice with indexed forum_id will be very fast. In addition, you can have some logic in the application to control pagination and avoid huge offsets .

Good luck

PS: I add a blog post on MysqlPerfomanceBlog "Why you do not want to be fined"

MySQL: data duplication for better performance?

More articles: