MySQL "SELECT DISTINCT" Efficiency for very large tables

Question

MySQL "SELECT DISTINCT" Efficiency for very large tables

I have a very large table (millions of records) containing approximately 8 fields as a primary key. for simplicity, let's say that the table looks like this:

key_1 | key_2 | key_3 | ... | key_8 | value

given the value for key_1, I need to get all possible values for key_2, key_3, ..., key_8 something in the following lines:

  SELECT DISTINCT key_2 FROM table1 WHERE key_1 = 123; SELECT DISTINCT key_3 FROM table1 WHERE key_1 = 123; ... SELECT DISTINCT key_8 FROM table1 WHERE key_1 = 123;

My problem is that this query is much slower than my performance, and the data in this table is pretty constant and rarely updated (once every few days). Also table_1 can be a slow subquery. If you do not create an additional table in the database and manually update it every time the database is updated, there is another solution that can give me quick results. I need it to work in multiple MySQL sessions.

+6

performance mysql

Smartelf May 29 '12 at 13:44

source share

2 answers

 SELECT DISTINCT key_2 FROM table1 WHERE key_1 = 123;

This may use your primary key index (key_1, key_2, etc.). It will perform an index scan, which is faster than a table scan or a temporary table.

 SELECT DISTINCT key_3 FROM table1 WHERE key_1 = 123;

You cannot use the primary key, because the combination of key_1 and key_3 does not form a prefix for the primary key. You need to create a composite index on key_1 and key_3 in that order. He can then also use this index to perform an index scan.

 SELECT DISTINCT key_8 FROM table1 WHERE key_1 = 123;

Requires an index on key_1 and key_8, in that order. Same as above.

+2

Marcus adams May 29 '12 at 14:21

source share

Konerak · Accepted Answer · 2012-05-29T13:52:52+0000

It is impossible to give a definitive answer with the information that we have, but start with them:

Do you have a pointer to key_1?

Without it, each request by itself will be slow, just looking for 123.

Do you have a pointer to (key_1, key_2)?

Because select distinct key_2 where key_1 = 123 really fast if it can get all the necessary data only from the index. No need to refer to the table.

Are rows / indexes fixed?

Moving a fixed-size table / row can be faster because it is always known where the xth record is by calculating the offset. Variable row size tables are slower.

Have you tried to add a surrogate auto-increment primary key?

Indexes work better when all they need to keep is a column and a small primary key. Complex primary keys are slower.

Did you read the table read-only?

You can pack the myisam table for quick access, but they become read-only. This is the hack that uses it.

Another step, do you consider a data warehouse?

If tables do not change frequently, it is best to duplicate information for quick access.

Can you post a show create table statement? Observing columns and indexes will help. Can you post an explain select statement? Seeing which indexes are used will help.

MySQL "SELECT DISTINCT" Efficiency for very large tables

Do you have a pointer to key_1?

Do you have a pointer to (key_1, key_2)?

Are rows / indexes fixed?

Have you tried to add a surrogate auto-increment primary key?

Did you read the table read-only?

Another step, do you consider a data warehouse?

More articles: