I have a large table in mysql.It has 13 million rows.
Mysql version is 5.7.10.
Table structure as below:
create table table_name (
user_id varchar(20) not null,
item_id varchar(20) not null
);
1. First sql:
select count(distinct user_id) from table;
result: 760,000
2. Second sql:
select count(1) from (select user_id from table group by user_id) a;
result: 120,000
user_id is not null for each row.
And, the correct number is 120,000. Why does the first sql get the wrong number?
Then I run the first sql in hive and spark-sql, the result is 120,000.
So, is this a mysql error, or can something be installed to fix everything?
Thank!
Update . I try this on another PC, the result of the first sql is 120,000. This time get the correct number. The version for MySQL is 5.6.26. So, perhaps this is a bug 5.7.10.