Use a timestamp (or date-time) as part of the primary key (or part of a clustered index)

I often use the following query:

SELECT * FROM table WHERE Timestamp > [SomeTime] AND Timestamp < [SomeOtherTime] and publish = 1 and type = 2 order by Timestamp 

I would like to optimize this query, and I think that the timestamp is marked as part of the primary key for the cluster index, I think that if the timestamp is part of the primary key, the data inserted into the table is written to disk sequentially by the timestamp field. I also think that this will greatly improve my query, but I'm not sure if this will help.

 table has 3-4 million+ rows. timestamp field never changed. I use mysql 5.6.11 

Anothet point: if this improves my query, is it better to use a timestamp (4 bytes in mysql 5.6) or datetime (5 bytes in mysql 5.6)?

+4
source share
2 answers

1) If the timestamp values ​​are unique, you can make it the primary key. If not, create an index in the timestamp column anyway, since you often use it in the "where".

2) using the BETWEEN clause looks more natural. I suggest you use the TREE index (default index type) rather than HASH.

3) when the index column is indexed, you do not need the order of calls - it is already sorted. (of course, if your TREE index is not HASH).

4) integer unix_timestamp is better than datetime both in terms of memory usage and performance. Comparing dates is more complicated than comparing integers.

A data search on an indexed field accepts an O (log (rows)) tree search. Comparing integers is O (1), and comparing dates is O (date_string_length). Thus, the difference (the number of tree searches) * (component_difference) = O (date_string_length) / O (1)) * O (log (rows)) = O (date_string_length) * O (log (rows))

+4
source

Four million lines are not huge.

The one-byte difference between datetime and timestamp data types is the last thing you should consider when choosing between these two data types. View their specifications .

Creating a timestamp for your primary key is a bad and bad idea. Think about what a primary key is in an SQL database.

Put the index in the timestamp column. Get a plan of execution and paste it into your question. Determine your median query performance and insert it into your question.

Returning rows in one day from an indexed table of 4 million rows on my desktop computer takes 2 ms. (It returns about 8000 rows.)

+4
source

Source: https://habr.com/ru/post/1483724/


All Articles