My application should track RSS / Atom feeds and keep new entries in the database. My question is: What is the most reliable method for determining if a feed entry has been crawled?
I use the Universal Feed Parser module for parsing feeds. My current implementation saves the record of the last value feed.entry[i].updated_parsed, while crawling, if the value of the updated_parsedrecord is greater than the recorded value, then this record is stored in the database. The problem is that in many feeds there is no published date or updated date.
source
share