Strange behavior with tagoup and Groovy XmlSlurper

Question

Strange behavior with tagoup and Groovy XmlSlurper

Let's say I want to parse the phone number from the xml string as follows:

str = """ <root> 
            <address>123 New York, NY 10019
                <div class="phone"> (212) 212-0001</div> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.div.text()

He does not print the phone number.

If I changed the div element to foo, like this

str = """ <root> 
            <address>123 New York, NY 10019
                <foo class="phone"> (212) 212-0001</foo> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.foo.text()

Then you can disassemble it and print the phone number.

What's happening?

Btw I am using groovy 1.7.5 and tagsoup 1.2

+3

xml parsing groovy tag-soup

user308808 Jan 27 '11 at 2:44

source share

3 answers

Oleg Iavorskyi · Answer 1 · 2011-02-01T20:24:57+0000

Just change the code to

println parser.address.'div'.text()

This is the curse of Groovy and many other dynamic languages - “div” is the reserved name of the method, so you are not getting the node, but rather trying to split the “address” of the node :)

winstaan74 · Answer 2 · 2011-08-01T13:15:28+0000

, , tagoup HTML, .. . , GPath, , ,

println parser.ADDRESS.DIV.text()

- , GPath . .

println groovy.xml.XmlUtil.serialize(parser)

Vanko · Answer 3 · 2016-07-22T12:46:54+0000

, . , , :

parser.'**'.findAll { it.name() == 'div' && it.@class.text() == 'phone' }.each { div ->
    println div.text()
}

Using depthFirst to search all tags
Filter by the name of the div that has the class phone ;
Print the value (212) 212-0001

Groovy version 2.4

Strange behavior with tagoup and Groovy XmlSlurper

More articles: