RSS Scraper

Question

RSS Scraper

Can someone point me to a finished RSS screen scraper, preferably in Python, to get full RSS text feeds?

+3

python rss

James wanchai Mar 2 '10 at 9:28

source share

3 answers

, Feed Parser, :

import feedparser

python_wiki_rss_url = "http://www.python.org/cgi-bin/moinmoin/" \
                      "RecentChanges?action=rss_rc"

feed = feedparser.parse( python_wiki_rss_url )

, :

for item in feed["items"]:
    print item["title"]

+3

Dominic Rodger 02 . '10 9:34

feedparser.org

+1

YOU 02 . '10 9:35

Recursion · Accepted Answer · 2010-03-02T09:43:45+0000

Sorry, this does not exist in python, although they do in php. You are more than welcome to use and improve the one I created with the name scraped. Although these are not all sites, this is a recipe-based system that currently only processes NYT, WSJ, and Economist. I am working on a comprehensive algorithm, but this is a serious undertaking. It includes a ton of analysis for different types of html and xml. Even the 3 sites mentioned above have completely different algorithms on how to clean up their sites, which WSJ are the most complex to date. They screw their HTML with such useless crap, mostly just to stop you.

, , lxml, readme. , rss-, , XML RSS 2.0. . lxml, BeautifulSoup feedparser.

http://tinyurl.com/yh3s9pa

, , , .

RSS Scraper

More articles: