How to crawl feed

My application should track RSS / Atom feeds and keep new entries in the database. My question is: What is the most reliable method for determining if a feed entry has been crawled?

I use the Universal Feed Parser module for parsing feeds. My current implementation saves the record of the last value feed.entry[i].updated_parsed, while crawling, if the value of the updated_parsedrecord is greater than the recorded value, then this record is stored in the database. The problem is that in many feeds there is no published date or updated date.

+3
source share
1 answer

, <guid> ( <link> <guid>), , .

+3

Source: https://habr.com/ru/post/1705490/


All Articles