I verify user data from my application using the public login ID. If for the first time the user enters the OPEN ID login, we consider it as registration. I am creating an audit report using this data. Sample table data.
+---------+----------+-----------+---------------+ | USER_ID | PROVIDER | OPERATION | TIMESTAMP | +---------+----------+-----------+---------------+ | 120 | Google | SIGN_UP | 1347296347000 | | 120 | Google | SIGN_IN | 1347296347000 | | 121 | Yahoo | SIGN_IN | 1347296347000 | | 122 | Yahoo | SIGN_IN | 1347296347000 | | 120 | Google | SIGN_UP | 1347296347000 | | 120 | FaceBook | SIGN_IN | 1347296347000 | +---------+----------+-----------+---------------+
In this table, I want to exclude already SIGN_UP ed " SIGN_IN " provider-based user counting.
Show Create Table
CREATE TABLE `signin_details` ( `USER_ID` int(11) DEFAULT NULL, `PROVIDER` char(40) DEFAULT NULL, `OPERATION` char(40) DEFAULT NULL, `TIMESTAMP` bigint(20) DEFAULT NULL ) ENGINE=InnoDB
I am using this query.
select count(distinct(USER_ID)) as signin_count, PROVIDER from signin_details s1 where s1.USER_ID NOT IN ( select USER_ID from signin_details where signin_details.PROVIDER=s1.PROVIDER and signin_details.OPERATION='SIGN_UP' and signin_details.TIMESTAMP/1000 BETWEEN UNIX_TIMESTAMP(CURRENT_DATE()-INTERVAL 1 DAY) * 1000 AND UNIX_TIMESTAMP(CURRENT_DATE()) * 1000 ) AND OPERATION='SIGN_IN' group by PROVIDER;
Explain the conclusion:
+----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+ | 1 | PRIMARY | s1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | signin_details | ALL | NULL | NULL | NULL | NULL | 6 | Using where | +----+--------------------+----------------+------+---------------+------+---------+------+------+-----------------------------+
Query output:
+--------------+----------+ | signin_count | PROVIDER | +--------------+----------+ | 1 | FaceBook | | 2 | Yahoo | +--------------+----------+
It takes more than 40 minutes to complete 200k lines.
My guess is to check each row with the total number of dependent subquery results.
My guess is on this request.
A -> Dependant Outputs (B,C,D) . A check with B A check with C A check with D
If the dependent query output is larger, it will take a long time to complete. How to improve this query?