Do nothing if the RSS feed has not changed.

I want to run a Python script every so many minutes. The script starts by loading a new article from the rss feed (using feedparser). What I want is when the new article is the same as the last time the script ends. How to do it?

+3
source share
3 answers

Since you mention feedparser in the python question, I assume you mean feedparser.org

If yes, then the easiest way to do this is to get the server to do most of the work for you and request updates only after the last change you selected: See ETag and Last-Modified headers for RSS feeds

+5
source

You can save the state in a temporary file. For instance. write the header to a temporary file if there is no temporary file, and next time read it from the file and compare the read header with the new header.

+3
source

There are several different approaches. The easiest way is probably to save a unique key or hash from the last article received for each feed that your program is associated with. This may be a combination of the article title and date, or even md5sum of the entire content of the article.

Then you can write this data to the XML status file for your script or use something like cpickle to save the data.Then each time your program starts, retrieve only the articles that are the most recent (checking each of your latest hash articles from the last run).

And, of course, don't forget to update your latest article feed hash before the script is released.

If your script deals with multiple channels, you will need to save one of these "last items of the article hash" on each channel.

0
source

Source: https://habr.com/ru/post/1444924/


All Articles