Scrapy xpath: select ancestor node

Question

Scrapy xpath: select ancestor node

I have a question about xpath

<div id="A" > <div class="B"> <div class="C"> <div class="item"> <div class="area"> <div class="sec">USA</div> <table> <tbody> <tr> <td><a href="">D1</a></td> <td>D2</td> </tr> <tr class="even"> <td><a href="">E1</a></td> <td>E2</td> </tr> </tbody> </table> </div> <div class="area"> <div class="sec">UK</div> <table> <tbody> <tr> <td><a href="">F1</a></td> <td>F2</td> </tr> </tbody> </table> </div> </div> </div>> </div> </div>

My code is:

 sel = Selector(response) group = sel.xpath("//div[@id='A']/div[@class='B']/div[@class='C']/div[@class='item']/div[@class='area']/table/tbody/tr") for g in group: # section = g.xpath("").extract() #ancestor??? context = g.xpath("./td[1]/a/text()").extract() brief = g.xpath("./td[2]/text()").extract() # print section[0] print context[0] print brief[0]

he will print:

 D1 D2 E1 E2 F1 F2

But I want to print:

 USA D1 D2 USA E1 E2 UK F1 F2

So I need to select the value of the parent node so that I can get USA and UK
I can’t figure this out for a while.
Please teach me to thank you!

+6

python xpath scrapy

user2492364 Oct 23 '14 at 8:16

source share

3 answers

andrean · Answer 1 · 2014-10-23T08:46:27+0000

In XPath, you can move back through the tree with .. , so a selector like this might work for you:

 section = g.xpath('../../../div[@class="sec"]/text()').extract()

Although this will work, it depends a lot on the specific structure of the document that you have. If you need a little more flexibility to talk about minor structural changes to the document, you can look back for an ancestor like this:

 section = g.xpath('ancestor::div[@class="area"]/div[@class="sec"]/text()').extract()

Saurabh · Answer 2 · 2016-06-24T12:55:59+0000

http://www.tizag.com/xmlTutorial/xpathparent.php is a good link.

Getting the parent can be done using xpathchild/..

fewtalks · Answer 3 · 2014-10-23T08:56:06+0000

 from lxml import etree, html import urllib2 a='<div id="A" ><div class="B"><div class="C"><div class="item"><div class="area"><div class="sec">USA</div> <table> <tbody> <tr> <td><a href="">D1</a></td> <td>D2</td> </tr> <tr class="even"> <td><a href="">E1</a></td> <td>E2</td> </tr> </tbody> </table> </div> <div class="area"> <div class="sec">UK</div> <table> <tbody> <tr> <td><a href="">F1</a></td> <td>F2</td> </tr> </tbody> </table> </div> </div> </div> </div> </div>' tree = etree.fromstring(a) print filter(lambda x:x.strip(),tree.xpath('//div[@class="area"]//text()'))

Exit: ['USA', 'D1', 'D2', 'E1', 'E2', 'UK', 'F1', 'F2']

// - extract all descendants / - extracts only direct children

Scrapy xpath: select ancestor node

More articles: