RSS Scraper

Can someone point me to a finished RSS screen scraper, preferably in Python, to get full RSS text feeds?

+3
source share
3 answers

Sorry, this does not exist in python, although they do in php. You are more than welcome to use and improve the one I created with the name scraped. Although these are not all sites, this is a recipe-based system that currently only processes NYT, WSJ, and Economist. I am working on a comprehensive algorithm, but this is a serious undertaking. It includes a ton of analysis for different types of html and xml. Even the 3 sites mentioned above have completely different algorithms on how to clean up their sites, which WSJ are the most complex to date. They screw their HTML with such useless crap, mostly just to stop you.

, , lxml, readme. , rss-, , XML RSS 2.0. . lxml, BeautifulSoup feedparser.

http://tinyurl.com/yh3s9pa

, , , .

0

, Feed Parser, :

import feedparser

python_wiki_rss_url = "http://www.python.org/cgi-bin/moinmoin/" \
                      "RecentChanges?action=rss_rc"

feed = feedparser.parse( python_wiki_rss_url )

, :

for item in feed["items"]:
    print item["title"]
+3
+1

Source: https://habr.com/ru/post/1735018/


All Articles