XmlSlurper - a list of text and regular nodes of an xhtml document

I am using Groovy XmlSlurper to parse an xhtml document (or sudo xhthml one) and I am trying to get to the text nodes of the document but cannot figure out how it looks, here is the code:

import groovy.util.*

xmlText = '''
<TEXTFORMAT INDENT="10" LEADING="-5">
  <P ALIGN="LEFT">
    <FONT FACE="Garamond Premr Pro" SIZE="20" COLOR="#001200" LETTERSPACING="0" KERNING="0">
      Less is more! this 
      <FONT COLOR="#FFFF00">should be all</FONT>
      the 
      <FONT COLOR="#00FF00"> words OR should some </FONT>
      OTHER WORDS will be there?
    </FONT>
  </P>
</TEXTFORMAT>
'''
records = new XmlSlurper().parseText(xmlText)
records.P.FONT.children().eachWithIndex {it, index -> println "${index} - ${it}"} 

Print the following output:

0 - should be all 
1 -  words OR should some

But I want it to print the contents of text nodes, so the desired result is:

0 - Less is more! this
1 - should be all
2 - the 
3 - words OR should some
4 - OTHER WORDS will be there?

Any ideas?

+3
source share
1 answer

XmlSlurper doesn't seem to have a separate method for extracting “mixed content”

There is an open element to add a mixed content support method here -> Groovy JIRA

+4
source

Source: https://habr.com/ru/post/1709410/


All Articles