Is it nice to store the number of lines and the number of lines to speed pagination?

My site has over 20,000,000 entries, entries have categories (FK) and tags (M2M). As for the query, even as SELECT id FROM table ORDER BY id LIMIT 1000000, 10 MySQL should scan 10,000,010 rows, but this is really unacceptably slow (and pks, indexes, join, etc. Etc. Nothing helps here, still 10,000,010 lines). Therefore, I am trying to speed up pagination by storing the number of lines and line number using these triggers:

 DELIMITER // CREATE TRIGGER @trigger_name AFTER INSERT ON entry_table FOR EACH ROW BEGIN UPDATE category_table SET row_count = (@rc := row_count + 1) WHERE id = NEW.category_id; NEW.row_number_in_category = @rc; END // 

And then I can simply:

 SELECT * FROM entry_table WHERE row_number_in_category > 10 ORDER BY row_number_in_category LIMIT 10 

(now only 10 rows are scanned, and therefore the samples grow quickly, although the inserts are slower, but they are rare compared to the selections, so this is normal)

Is this a bad approach and are there any good alternatives?

+5
source share
1 answer

Although I like the solution to the question. It may present some problems if the data in the entry_table is changed - perhaps deleted or assigned to different categories over time.

It also restricts the way data is sorted; the method assumes that data is sorted only in insertion order. To cover multiple sorting methods, additional triggers and summary data are required.

One alternative way of pagination is to pass the offset of the field that you are sorting / paging, instead of the offset to the limit parameter.

Instead of this:

 SELECT id FROM table ORDER BY id LIMIT 1000000, 10 

Do this - assuming in this case that the last result viewed had an identifier of 1,000,000.

 SELECT id FROM table WHERE id > 1000000 ORDER BY id LIMIT 0, 10 

By tracking pagination, this can be passed on to subsequent queries for data and avoids sorting database rows that will never be part of the end result.

If you really only need 10 lines of 20 million, you can go further and guess that the next 10 matching lines will appear in the next 1000 total results. It is possible, with some logic, to repeat the request with a large income, if this is not so.

 SELECT id FROM table WHERE id BETWEEN 1000000 AND 1001000 ORDER BY id LIMIT 0, 10 

This should be significantly faster, because sorting is likely to limit the result in a single pass.

+1
source

Source: https://habr.com/ru/post/1233483/


All Articles