RSS aggregator using Google App Engine - Python

I am trying to create a GAE application that processes an RSS feed and saves all the data from a feed in Google Datastore. I am using Minidom to extract content from an RSS feed. I also tried using Feedparser and BeautifulSoup, but they did not work for me.

My app is currently analyzing the feed and saving it to the Google data store in about 25 seconds on my local computer. I downloaded the application, and when I tried to use it, I got a "DeadLine Exceeded Error".

I would like to know if there are any possible ways to speed up this process? The stream that I use will eventually have more than 100 units.

+4
source share
3 answers

It should not last so long. Here's how you can use Universal Parser .

# easy_install feedparser 

And an example of its use:

 import feedparser feed = 'http://stackoverflow.com/feeds/tag?tagnames=python&sort=newest' d = feedparser.parse(feed) for entry in d['entries']: print entry.title 

The documentation shows how to get other things out of the feed. If you have a specific problem, send details.

+6
source

I have found a way around this problem, although I'm not sure if this is the best solution.

Instead of Minidom, I used cElementTree to parse the RSS feed. I process each item tag and its children in a separate task and add these tasks to the task queue.

This helped me avoid a DeadlineExceededError. However, I get the warning "This resource is using a lot of CPU resources."

Any idea on how to avoid the warning?

A_iyer

+1
source

I have a demo / prototype GAE RSS reader using Feedparser - http://deliciourss.appspot.com/ . Here is some code -

Get feed.

 data = urlfetch.fetch(feedUrl) 

Analysis with Feedparser

 parsedData = feedparser.parse(data.content) 

Change some feed features

  # set main section to description if empty for ix in range(len(parsedData.entries)): bItem = 0 if hasattr(parsedData.entries[ix],'content'): for item in parsedData.entries[ix].content: if item.value: bItem = 1 break if bItem == 0: parsedData.entries[ix].content[0].value = parsedData.entries[ix].summary else: parsedData.entries[ix].content = [{'value':parsedData.entries[ix].summary}] 

Template if you use Django / webapp

 <?xml version="1.0" encoding="utf-8"?> <channel> <title>{{parsedData.channel.title}}</title> <url>{{feedUrl}}</url> <id>{{parsedData.channel.id}}</id> <updated>{{parsedData.channel.updated}}</updated> {% for entry in parsedData.entries %} <item> <id>{{entry.id}}</id> <title>{{entry.title}}</title> <link> {% for link in entry.links %} {% ifequal link.rel "alternate" %} {{link.href|escape}} {% endifequal %} {% endfor %} </link> <author>{{entry.author_detail.name}}</author> <pubDate>{{entry.published}}</pubDate> <description>{{entry.summary|escape}}</description> {% for item in entry.content %} {% if item.value %} <content>{{item.value|escape}}</content> {% endif %} {% endfor %} </item>{% endfor %} </channel> 
0
source

Source: https://habr.com/ru/post/1300671/


All Articles