How to parse XML feed using python?

I am trying to parse this xml (http://www.reddit.com/r/videos/top/.rss) and I am having problems. I am trying to save youtube links in each of the elements, but I am having problems due to the node's child channel. How do I get to this level so that I can then iterate over the elements?

#reddit parse reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') #convert to string: reddit_data = reddit_file.read() #close file because we dont need it anymore: reddit_file.close() #entire feed reddit_root = etree.fromstring(reddit_data) channel = reddit_root.findall('{http://purl.org/dc/elements/1.1/}channel') print channel reddit_feed=[] for entry in channel: #get description, url, and thumbnail desc = #not sure how to get this reddit_feed.append([desc]) 
+4
source share
2 answers

You can try findall('channel/item')

 import urllib2 from xml.etree import ElementTree as etree #reddit parse reddit_file = urllib2.urlopen('http://www.reddit.com/r/videos/top/.rss') #convert to string: reddit_data = reddit_file.read() print reddit_data #close file because we dont need it anymore: reddit_file.close() #entire feed reddit_root = etree.fromstring(reddit_data) item = reddit_root.findall('channel/item') print item reddit_feed=[] for entry in item: #get description, url, and thumbnail desc = entry.findtext('description') reddit_feed.append([desc]) 
+5
source

I wrote that Xpath expressions are used for you (successfully tested):

 from lxml import etree import urllib2 headers = { 'User-Agent' : 'Mozilla/5.0' } req = urllib2.Request('http://www.reddit.com/r/videos/top/.rss', None, headers) reddit_file = urllib2.urlopen(req).read() reddit = etree.fromstring(reddit_file) for item in reddit.xpath('/rss/channel/item'): print "title =", item.xpath("./title/text()")[0] print "description =", item.xpath("./description/text()")[0] print "thumbnail =", item.xpath("./*[local-name()='thumbnail']/@url")[0] print "link =", item.xpath("./link/text()")[0] print "-" * 100 
+3
source

Source: https://habr.com/ru/post/1439601/


All Articles