MySQL: dependent sub-query with NOT IN in WHERE clause is very slow

I verify user data from my application using the public login ID. If for the first time the user enters the OPEN ID login, we consider it as registration. I am creating an audit report using this data. Sample table data.

+---------+----------+-----------+---------------+ | USER_ID | PROVIDER | OPERATION | TIMESTAMP | +---------+----------+-----------+---------------+ | 120 | Google | SIGN_UP | 1347296347000 | | 120 | Google | SIGN_IN | 1347296347000 | | 121 | Yahoo | SIGN_IN | 1347296347000 | | 122 | Yahoo | SIGN_IN | 1347296347000 | | 120 | Google | SIGN_UP | 1347296347000 | | 120 | FaceBook | SIGN_IN | 1347296347000 | +---------+----------+-----------+---------------+ 

In this table, I want to exclude already SIGN_UP ed " SIGN_IN " provider-based user counting.

Show Create Table

 CREATE TABLE `signin_details` ( `USER_ID` int(11) DEFAULT NULL, `PROVIDER` char(40) DEFAULT NULL, `OPERATION` char(40) DEFAULT NULL, `TIMESTAMP` bigint(20) DEFAULT NULL ) ENGINE=InnoDB 

I am using this query.

 select count(distinct(USER_ID)) as signin_count, PROVIDER from signin_details s1 where s1.USER_ID NOT IN ( select USER_ID from signin_details where signin_details.PROVIDER=s1.PROVIDER and signin_details.OPERATION='SIGN_UP' and signin_details.TIMESTAMP/1000 BETWEEN UNIX_TIMESTAMP(CURRENT_DATE()-INTERVAL 1 DAY) * 1000 AND UNIX_TIMESTAMP(CURRENT_DATE()) * 1000 ) AND OPERATION='SIGN_IN' group by PROVIDER; 

Explain the conclusion:

 +----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+ | 1 | PRIMARY | s1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | signin_details | ALL | NULL | NULL | NULL | NULL | 6 | Using where | +----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+ 

Query output:

 +--------------+----------+ | signin_count | PROVIDER | +--------------+----------+ | 1 | FaceBook | | 2 | Yahoo | +--------------+----------+ 

It takes more than 40 minutes to complete 200k lines.

My guess is to check each row with the total number of dependent subquery results.

My guess is on this request.

  A -> Dependant Outputs (B,C,D) . A check with B A check with C A check with D 

If the dependent query output is larger, it will take a long time to complete. How to improve this query?

+4
source share
1 answer

If you are using MySQL , you need to know that helper queries are awfully slow.

IN slow ...

EXISTS often faster than IN

JOIN is the fastest way to do such things.

 SELECT DISTINCT s1.PROVIDER, COUNT(DISTINCT s1.USER_ID) FROM signin_details s1 LEFT JOIN ( SELECT DISTINCT USER_ID, PROVIDER FROM signin_details WHERE signin_details.OPERATION='SIGN_UP' AND signin_details.TIMESTAMP BETWEEN UNIX_TIMESTAMP(CURRENT_DATE()-INTERVAL 1 DAY) * 1000 AND UNIX_TIMESTAMP(CURRENT_DATE()) * 1000 ) AS t USING (USER_ID, PROVIDER) WHERE t.USER_ID IS NULL AND OPERATION='SIGN_IN' GROUP BY s1.PROVIDER 

http://sqlfiddle.com/#!2/122ac/12

NOTE. If you are wondering about the results of sqlfiddle, consider UNIX_TIMESTAMP in the query here.

Result:

 | PROVIDER | COUNT(DISTINCT S1.USER_ID) | ----------------------------------------- | FaceBook | 1 | | Yahoo | 2 | 

MySQL and the history of INTERSECT . You get all USER_ID and PROVIDER combinations that you do not want to count. Then LEFT JOIN them to your data. Now all the rows you want to count do not have values ​​from LEFT JOIN . You will get them t.USER_ID IS NULL .


Input:

 | rn° | USER_ID | PROVIDER | OPERATION | TIMESTAMP | ------------------------------------------------------- | 1 | 120 | Google | SIGN_UP | 1347296347000 | - | 2 | 120 | Google | SIGN_IN | 1347296347000 | - (see rn° 1) | 3 | 121 | Yahoo | SIGN_IN | 1347296347000 | Y | 4 | 122 | Yahoo | SIGN_IN | 1347296347000 | Y | 5 | 120 | Google | SIGN_UP | 1347296347000 | - | 6 | 120 | FaceBook | SIGN_IN | 1347296347000 | F | 7 | 119 | FaceBook | SIGN_IN | 1347296347000 | - (see rn° 8) | 8 | 119 | FaceBook | SIGN_UP | 1347296347000 | - 
+4
source

Source: https://habr.com/ru/post/1433427/


All Articles