Create an API endpoint for dynamic time-based data

I have a scraper that periodically dumps articles from news sites and stores them in the [MYSQL] database. The way the scrapers work is to put the oldest articles first, and then I move on to much later articles.

For example, an article written in 1st of Jan will be first cleared and ID 1 will be indicated , and an article that was cleared of 2nd from Jan will have ID 2 .

Thus, recent articles will have a higher identifier than older articles.

Several scraper works at the same time.

Now I need an endpoint, which I can request based on the timestamp for articles, and I also have a limit of 10 articles for each sample.

The problem occurs, for example, when there are 20 articles that were sent with a time stamp of 1499241705 , and when I request an endpoint with a time stamp of 1499241705 , a check is performed to give me all the articles that > = 1499241705 , in which case I always get the same 10 articles each time, changing the condition to > will mean that I will skip articles from 11-20 . Adding another where clause to check id is not performed, because articles cannot always be inserted in the correct date order, since the scraper works simultaneously.

Is there a way I can request this endpoint, so I can always get consistent data from it with the last articles, first and then old articles.

EDIT:

   +-----------------------+
   |   id | unix_timestamp |
   +-----------------------+
   |    1 |   1000         |
   |    2 |   1001         |
   |    3 |   1002         |
   |    4 |   1003         |
   |   11 |   1000         |
   |   12 |   1001         |
   |   13 |   1002         |
   |   14 |   1003         |
   +-----------------------+

The last timestamp and identifier are sent via the WHERE clause .

eg.   $this->db->where('unix_timestamp <=', $timestamp); $this->db->where('id <', $offset); $this->db->order_by('unix_timestamp ', 'DESC'); $this->db->order_by('id', 'DESC');

When querying with a timestamp of 1003, identifiers 14 and 4 are selected. But then during the next call, identifier 4 would be an offset, thus not receiving identifier 13 and only fetch id 3 the next time. Therefore, the data will be missing.

+4
source share
1 answer

Two parts: timestamp and identifier.

WHERE   timestamp <= $ts_leftoff
  AND ( timestamp <  $ts_leftoff
            OR id <= $id_leftoff )
ORDER BY (timestamp DESC, id DESC)

, , id , , timestamp, .

, , , :

WHERE (timestamp, id) <= ($ts_leftoff, $id_leftoff)

, .

"off off": http://mysql.rjweb.org/doc.php/pagination

+2

Source: https://habr.com/ru/post/1680774/


All Articles