I have a scraper that periodically dumps articles from news sites and stores them in the [MYSQL] database. The way the scrapers work is to put the oldest articles first, and then I move on to much later articles.
For example, an article written in 1st of Jan will be first cleared and ID 1 will be indicated , and an article that was cleared of 2nd from Jan will have ID 2 .
Thus, recent articles will have a higher identifier than older articles.
Several scraper works at the same time.
Now I need an endpoint, which I can request based on the timestamp for articles, and I also have a limit of 10 articles for each sample.
The problem occurs, for example, when there are 20 articles that were sent with a time stamp of 1499241705 , and when I request an endpoint with a time stamp of 1499241705 , a check is performed to give me all the articles that > = 1499241705 , in which case I always get the same 10 articles each time, changing the condition to > will mean that I will skip articles from 11-20 . Adding another where clause to check id is not performed, because articles cannot always be inserted in the correct date order, since the scraper works simultaneously.
Is there a way I can request this endpoint, so I can always get consistent data from it with the last articles, first and then old articles.
EDIT:
+-----------------------+
| id | unix_timestamp |
+-----------------------+
| 1 | 1000 |
| 2 | 1001 |
| 3 | 1002 |
| 4 | 1003 |
| 11 | 1000 |
| 12 | 1001 |
| 13 | 1002 |
| 14 | 1003 |
+-----------------------+
The last timestamp and identifier are sent via the WHERE clause .
eg. $this->db->where('unix_timestamp <=', $timestamp);
$this->db->where('id <', $offset);
$this->db->order_by('unix_timestamp ', 'DESC');
$this->db->order_by('id', 'DESC');
When querying with a timestamp of 1003, identifiers 14 and 4 are selected. But then during the next call, identifier 4 would be an offset, thus not receiving identifier 13 and only fetch id 3 the next time. Therefore, the data will be missing.