XPath to select only children (not empty text nodes)

Question

XPath to select only children (not empty text nodes)

I am parsing some XML using Nokogiri and XPath. When I do this:

doc.xpath('//Order/child::node()').each do |node| puts node.name end

It prints all the nodes, but also between the names, and displays "text." I think I know why:

In my xml there are gaps between such nodes: "<a1>hi</a1> \n <a2>bye</a2>"

Is there a way I can say to ignore stuff between nodes?

+4

ruby xml xpath nokogiri

0xSina Jan 17 '12 at 3:52

source share

2 answers

If you only need elements, use the best XPath: the query /* will find all the children elements:

 require 'nokogiri' doc = Nokogiri.XML("<r><a>1</a>\n\t<b>2</b></r>") p doc.xpath('/r/child::node()').map(&:name) #=> ["a", "text", "b"] p doc.xpath('/r/*').map(&:name) #=> ["a", "b"]

Alternatively, you can ask Nokogiri to throw away any text notes that are just spaces:

 doc2 = Nokogiri.XML("<r><a>1</a>\n\t<b>2</b></r>",&:noblanks) p doc2.xpath('/r/child::node()').map(&:name) #=> ["a", "b"]

Or you can use Ruby to further filter your NodeSet based on arbitrary criteria:

 mine = doc.xpath('/r/child::node()').select do |node| node.type != Nokogiri::XML::Node::TEXT_NODE || node.content =~ /\S/ end p mine.map(&:name) #=> ["a", "b"]

+4

Phrogz Jan 17 '12 at 4:15

source share

Dimitre novatchev · Accepted Answer · 2012-01-17T14:21:58+0000

Using

 //Order/node()[not(self::text()[not(normalize-space())])]

this selects all child nodes of any Order element, except those that are text nodes consisting entirely of white space.

XSLT Based Validation :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:template match="/*"> <xsl:variable name="vSel1" select="//Order/node()"/> <xsl:variable name="vSel2" select= "//Order/node()[not(self::text()[not(normalize-space())])]"/> <xsl:for-each select="$vSel1"> <xsl:value-of select="concat('&#xA;',position(), ': ')"/> <xsl:copy-of select="."/> <xsl:text>&#xA;</xsl:text> </xsl:for-each> ================ <xsl:for-each select="$vSel2"> <xsl:value-of select="concat('&#xA;',position(), ': ')"/> <xsl:copy-of select="."/> <xsl:text>&#xA;</xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet>

when this conversion is applied to the following XML document :

 <t> <Order> <a/> <b>xxx</b> <c/> </Order> <Order> <d/> <e>xxx</e> <f/> </Order> </t>

two XPath expressions are calculated and nodes are derived from two corresponding sets of selected nodes, each of which is preceded by its position number :

 1: 2: <a/> 3: 4: <b>xxx</b> 5: 6: <c/> 7: 8: 9: <d/> 10: 11: <e>xxx</e> 12: 13: <f/> 14: ================ 1: <a/> 2: <b>xxx</b> 3: <c/> 4: <d/> 5: <e>xxx</e> 6: <f/>

XPath to select only children (not empty text nodes)

More articles: