Optimization of long query in mysql in huge table size 33M rows

Inquiry:

SELECT users.id as uid, name, avatar, avatar_date, driver, messages.id AS mid,messages.msg, messages.removed, messages.from_anonym_id, messages.t o_anonym_id, (messages.date DIV 1000) AS date, from_id = 162077 as outbox, !(0 in (SELECT read_state FROM messages as msgs WHERE (msgs.from_id = messages.from_id or msgs.from_id = messages.user_id) and msgs.user_id = 162077 and removed = 0)) as read_state FROM dialog, messages, users WHERE messages.id = dialog.mid and ((uid1 = 162077 and users.id = uid2) or (uid2 = 162077 and users.id = uid1) ) ORDER BY dialog.mid DESC LIMIT 0, 101; 

Table structure:

 mysql> desc messages; +----------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | from_id | int(11) | NO | MUL | NULL | | | user_id | int(11) | NO | MUL | NULL | | | group_id | int(11) | NO | | NULL | | | to_number | varchar(30) | NO | MUL | NULL | | | msg | text | NO | | NULL | | | image | varchar(20) | NO | | NULL | | | date | bigint(20) | NO | | NULL | | | read_state | tinyint(1) | NO | | 0 | | | removed | tinyint(1) | NO | MUL | NULL | | | from_anonym_id | int(10) unsigned | NO | MUL | NULL | | | to_anonym_id | int(10) unsigned | NO | MUL | NULL | | +----------------+------------------+------+-----+---------+----------------+ mysql> desc dialog; +----------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------------+------------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | uid1 | int(11) | NO | MUL | NULL | | | uid2 | int(11) | NO | MUL | NULL | | | mid | int(11) | NO | MUL | NULL | | | from_anonym_id | int(10) unsigned | NO | MUL | NULL | | | to_anonym_id | int(10) unsigned | NO | MUL | NULL | | +----------------+------------------+------+-----+---------+----------------+ mysql> show index from messages; +----------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | messages | 0 | PRIMARY | 1 | id | A | 42944290 | NULL | NULL | | BTREE | | | | messages | 1 | user_id_2 | 1 | user_id | A | 2147214 | NULL | NULL | | BTREE | | | | messages | 1 | user_id_2 | 2 | read_state | A | 2862952 | NULL | NULL | | BTREE | | | | messages | 1 | user_id_2 | 3 | removed | A | 2862952 | NULL | NULL | | BTREE | | | | messages | 1 | from_id | 1 | from_id | A | 825851 | NULL | NULL | | BTREE | | | | messages | 1 | from_id | 2 | to_number | A | 825851 | NULL | NULL | | BTREE | | | | messages | 1 | to_number | 1 | to_number | A | 29 | NULL | NULL | | BTREE | | | | messages | 1 | idx_user_id | 1 | user_id | A | 2044966 | NULL | NULL | | BTREE | | | | messages | 1 | idx_from_id | 1 | from_id | A | 447336 | NULL | NULL | | BTREE | | | | messages | 1 | removed | 1 | removed | A | 29 | NULL | NULL | | BTREE | | | | messages | 1 | from_anonym_id | 1 | from_anonym_id | A | 29 | NULL | NULL | | BTREE | | | | messages | 1 | to_anonym_id | 1 | to_anonym_id | A | 29 | NULL | NULL | | BTREE | | | +----------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 12 rows in set (0.01 sec) mysql> show index from dialog; +--------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +--------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | dialog | 0 | PRIMARY | 1 | id | A | 6378161 | NULL | NULL | | BTREE | | | | dialog | 1 | uid1 | 1 | uid1 | A | 455582 | NULL | NULL | | BTREE | | | | dialog | 1 | uid1 | 2 | uid2 | A | 6378161 | NULL | NULL | | BTREE | | | | dialog | 1 | uid2 | 1 | uid2 | A | 2126053 | NULL | NULL | | BTREE | | | | dialog | 1 | idx_mid | 1 | mid | A | 6378161 | NULL | NULL | | BTREE | | | | dialog | 1 | from_anonym_id | 1 | from_anonym_id | A | 17 | NULL | NULL | | BTREE | | | | dialog | 1 | to_anonym_id | 1 | to_anonym_id | A | 17 | NULL | NULL | | BTREE | | | +--------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 

PS, please do not advise me on a theoretical recipe, only practical examples. thanks in advance.

If I remove this statement

 !(0 in (SELECT read_state FROM messages as msgs WHERE (msgs.from_id = messages.from_id or msgs.from_id = messages.user_id) and msgs.user_id = 162077 and removed = 0)) as read_state 

the request works very well compared to the original: 101 lines in the set (0.04 sec)

I believe this is the main problem, but I need this field. Maybe someone can turn this round and make it faster, it would be very nice.

+5
source share
3 answers

I would start with an index in the message table. A component index that will help cover the connection, as I have in the sample request below ... Index on (user_id, deleted, read_state, from_id).

Next, an explanation of my process. I make a preliminary request from the dialog table as UNION, but each of them accordingly captures the opposite identifier for "LinkToUser" for the next loop associated with the user table after matching with "OR", as in the where clause. Getting qualified records ahead and simplifying can help you.

The next part will indicate the index for your posts. I am making a left join based on a specific user, remote = 0 and SPECIFICALLY read_state = 0. Using the index, it will either find a match or it won’t. Thus, your Selected field (! 0 in ...) is simply simplified to an IS NULL check.

 SELECT u.id as uid, u.name, avatar, avatar_date, driver, m.id AS mid, m.msg, m.removed, m.from_anonym_id, m.to_anonym_id, (m.date DIV 1000) AS date, from_id = 162077 as outbox, msgFrom.from_id IS NULL as read_state FROM ( select distinct d1.*, d1.uid2 as LinkToUser from dialog d1 where d1.uid1 = 162077 union select d2.*, d2.uid1 as LinkToUser from dialog d2 where d2.uid2 = 162077 ) Qualified JOIN Users u ON Qualified.LinkToUser = u.id JOIN Messages m ON Qualified.mid = m.id LEFT JOIN Messages msgFrom ON msgFrom.user_id = 160277 AND msgFrom.Removed = 0 AND msgFrom.Read_State = 0 AND ( m.from_id = msgFrom.from_id OR m.user_id = msgFrom.from_id ) ORDER BY Qualified.mid DESC LIMIT 0, 101; 

you may need to play a little with him, maybe change something like ..

 if( msgFrom.from_id IS NULL, 0, msgFrom.read_state ) as Read_State 

EXPLANATIONS

Zeusakm, your individual read_state field, as written, will ONLY return 1 or 0, since this is a logical condition NOT - the value zero in the selected message list. It will never return -1, as you indicated in your comment. My version does the same. If it finds zero, return zero .. if it cannot find zero, it returns 1 because the comparison value will be NULL, and thus "IsThisValue IS NULL" returns true, which matches flag 1.

So, we hope this clarifies what I'm doing with the left join for you. Explicitly looking for the user ID, the remote state and the read state and (from or match the user ID).

+3
source

This is your query with join syntax and table aliases added for the outer query tables:

 SELECT u.id as uid, name, avatar, avatar_date, driver, m.id AS mid, m.msg, m.removed, m.from_anonym_id, mt o_anonym_id, (m.date DIV 1000) AS date, from_id = 162077 as outbox, !(0 in (SELECT read_state FROM messages m2 WHERE (m2.from_id = m.from_id or m2.from_id = m.user_id) and m2.user_id = 162077 and removed = 0 ) ) as read_state FROM dialog d join messages m on m.id = d.mid join users u on (uid1 = 162077 and users.id = uid2) or (uid2 = 162077 and users.id = uid1) ORDER BY d.mid DESC LIMIT 0, 101; 

If the query works well without a subquery in the select clause, I would recommend replacing it. in can be an expensive statement, especially with or on conditions. Therefore, I would recommend replacing it with:

  (case when exists (select 1 from messages m2 where m2.user_id = 162077 and m2.removed = 0 and m2.from_id = m.from_id and m2.read_state = 0 ) then 0 when exists (select 1 from messages m2 where m2.user_id = 162077 and m2.removed = 0 and m2.from_id = m.user_id and m2.read_state = 0 ) then 0 else 1 end) 

And you need an index on messages(from_id, user_id, removed, read_state) .

+3
source

create a temporary table and insert all columns except readstate, the default is -1, and also keep form_id update the readstate column, similar to the Gordon post.

 CREATE TEMPORARY TABLE userTable SELECT u.id as uid, name, avatar, avatar_date, driver, m.id AS mid, m.msg, m.removed, m.from_anonym_id, mt o_anonym_id, (m.date DIV 1000) AS date, from_id = 162077 as outbox, m.form_id, -1 as read_state FROM dialog d join messages m on m.id = d.mid join users u on (uid1 = 162077 and users.id = uid2) or (uid2 = 162077 and users.id = uid1) ORDER BY d.mid DESC LIMIT 0, 101; update userTable set readstate = (case when exists (select 1 from messages m2 where m2.user_id = 162077 and m2.removed = 0 and m2.from_id = userTable.from_id and m2.read_state = 0 ) then 0 when exists (select 1 from messages m2 where m2.user_id = 162077 and m2.removed = 0 and m2.from_id = userTable.uid and m2.read_state = 0 ) then 0 else 1 end) 
+2
source

Source: https://habr.com/ru/post/1201922/


All Articles