How you group, bucket and filter result sets for a very large dataset

Sorry, the question title is somewhat vague, so here is a working example.

I have a table in which each userid (userid) gets a value every few days. I want to find the last of these values ​​for each user, broken down by months, and count their number in the range.

Here is an example of a table and representative data:

CREATE TABLE `datasource` (
    `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
    `userId` INT UNSIGNED NOT NULL ,
    `unixts` INT UNSIGNED NOT NULL ,
    `value` INT UNSIGNED NOT NULL ,
    INDEX ( `userId` )
);

INSERT INTO `datasource` 
    (`userId`, `unixts`, `value`)
VALUES 
    (1, UNIX_TIMESTAMP('2010-07-01'), 500),
    (1, UNIX_TIMESTAMP('2010-07-15'), 610),
    (1, UNIX_TIMESTAMP('2010-08-02'), 740),

    (2, UNIX_TIMESTAMP('2010-07-03'), 506),
    (2, UNIX_TIMESTAMP('2010-07-18'), 640),
    (2, UNIX_TIMESTAMP('2010-08-09'), 340),

    (3, UNIX_TIMESTAMP('2010-07-03'), 506),
    (3, UNIX_TIMESTAMP('2010-08-18'), 640)
;

Now here is a request to get what I get after:

select
    month(FROM_UNIXTIME(unixts)) as month,
    sum( if( value >= 700, 1, 0) ) as '700 and up',
    sum( if( value BETWEEN 600 AND 699, 1, 0) ) as '600-699',
    sum( if( value BETWEEN 500 AND 599, 1, 0) ) as '500-599',
    sum( if( value <= 499, 1, 0) ) as '499 and below',
    count(*) as total
from
    datasource
where
    id in (
        select 
            max(id)
        from 
            datasource 
        where 
            unixts between UNIX_TIMESTAMP('2010-07-01') and UNIX_TIMESTAMP('2010-09-01')
        group by 
            userId, month(from_unixtime(unixts))
    )
group by
    month(FROM_UNIXTIME(unixts));

+-------+------------+---------+---------+---------------+-------+
| month | 700 and up | 600-699 | 500-599 | 499 and below | total |
+-------+------------+---------+---------+---------------+-------+
|     7 |          0 |       2 |       1 |             0 |     3 |
|     8 |          1 |       1 |       0 |             1 |     3 |
+-------+------------+---------+---------+---------------+-------+

This query is great for our small result set. However, if you drop 44 million rows into the data table, it stops.

Is there an optimized way to write this query that can achieve what I want without mysql binding in a few days?

+3
2

EXPLAIN...;

, . , - . , , Cfreak. , , , .

+2

.

create index value_index ON datasource(value)

. .

0

Source: https://habr.com/ru/post/1758447/


All Articles