Why does normalize-space (text ()) ignore internal nodes when selected by text?

Question

Why does normalize-space (text ()) ignore internal nodes when selected by text?

why in the following example I can use //label[text()[normalize-space() = 'some label']] or //label[normalize-space(text()) = 'some label'] to select a label by text and ignoring span content? What for? I really want to understand this problem. There is no information about this function at http://www.w3.org/TR/xpath/#function-normalize-space . This is exactly what I want, but I also desperately want to know why this solution works :)

BTW, which syntax is better: //label[text()[normalize-space() = 'some label']] vs //label[normalize-space(text()) = 'some label'] and why?

 <label> <span>some span</span> some label </label> <label> other label <span>other span</span> </label>

I am looking for a useful answer :)

+4

html xpath

master.py Nov 08 '14 at 16:49

source share

2 answers

text() returns all text nodes that are children of the current node (label)

But some span not a child of the label, it is a child of the span.

You can use //text() to get all streaming text nodes or span/text() to get text range nodes

-

You need to use //label[//text()[normalize-space() = 'some label']] instead of //label[normalize-space(//text()) = 'some label'] , because the latter only works if there is one node text

+3

Benibela Nov 08 '14 at 17:16

source share

Michael kay · Accepted Answer · 2014-11-08T23:36:38+0000

This has nothing to do with normalize-space () and everything related to text() .

text() abbreviated for child::text() and selects text nodes that are immediate children of the label element. If you do not delete the text node nodes, the label element in your example has two child text nodes, one of which is all spaces, and the other contains "some label" surrounded by spaces.

 BTW, which syntax is better: //label[text()[normalize-space() = 'some label']] vs //label[normalize-space(text()) = 'some label'] and why?

They do different things; one that is better, one that does what you want to achieve.

In XPath 1.0, the first expression selects label elements that have a child text node whose value, after normalizing the spaces, is "some label". The second selects the label elements, whose first child text node, after normalizing the spaces, is "some label". This is because normalize-space () (like all functions that expect a string), if you give it node -set, takes the string value of the first node in node -set.

In XPath 2.0, the first expression selects label elements that have a child text node whose value, after normalizing the spaces, is "some label". The second one selects label elements if they have a child text node, after normalizing the space, it equals "some label", but it causes an error if the label element has more than one child text node. This is because normalize-space () (like all functions that expect a string) atomizes its argument and reports a type error if the length of the atomized sequence is greater than one.

Why does normalize-space (text ()) ignore internal nodes when selected by text?

More articles: