Strange behavior with tagoup and Groovy XmlSlurper

Let's say I want to parse the phone number from the xml string as follows:

str = """ <root> 
            <address>123 New York, NY 10019
                <div class="phone"> (212) 212-0001</div> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.div.text()

He does not print the phone number.

If I changed the div element to foo, like this

str = """ <root> 
            <address>123 New York, NY 10019
                <foo class="phone"> (212) 212-0001</foo> 
            </address> 
        </root> 
    """
parser = new XmlSlurper(new org.ccil.cowan.tagsoup.Parser()).parseText (str)
println parser.address.foo.text()

Then you can disassemble it and print the phone number.

What's happening?

Btw I am using groovy 1.7.5 and tagsoup 1.2

+3
source share
3 answers

Just change the code to

println parser.address.'div'.text()

This is the curse of Groovy and many other dynamic languages ​​- “div” is the reserved name of the method, so you are not getting the node, but rather trying to split the “address” of the node :)

+1
source

, , tagoup HTML, .. . , GPath, , ,

println parser.ADDRESS.DIV.text()

- , GPath . .

println groovy.xml.XmlUtil.serialize(parser)
0

, . , , :

parser.'**'.findAll { it.name() == 'div' && it.@class.text() == 'phone' }.each { div ->
    println div.text()
}
  • Using depthFirst to search all tags
  • Filter by the name of the div that has the class phone ;
  • Print the value (212) 212-0001

Groovy version 2.4

0
source

Source: https://habr.com/ru/post/1788089/


All Articles