Retrieving attribute value in XML with regex

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE ... ]> 
<abc-config version="THIS" id="abc">
...
</abc-config>

Hello everybody,

In the above code, how can I extract the value of the version attribute using Regex in Groovy / Java?

Thank.

+3
source share
3 answers

The regular expression for processing might be something like this:

/<\?xml version="([0-9.]+)"/

I will show you one of 10,000 lectures about not using regex to parse markup languages.

Edit: The one whose name cannot be expressed in a basic multilingual plan, he forced me to .

+2
source

I know you requested a regex, but what happened to this in Groovy?

Assuming the xml looks something like this:

def xml= '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE abc-config>
<abc-config version="THIS" id="abc">
  <node></node>
</abc-config>'''

Then I can analyze it with:

def n = new XmlSlurper().parseText( xml )

And then this line:

println n.@version

"THIS"


DOCTYPE, DOCTYPE:

def parser = new XmlSlurper()
parser.setFeature( "http://apache.org/xml/features/nonvalidating/load-external-dtd", false )
parser.setFeature( "http://xml.org/sax/features/namespaces", false )
parser.parseText( xml )

XmlSlurper, 2 ,

+2

Not a java regular expression, Perl regular expression ...
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*["'](.+?)["'][^>]*?\s*\/?>/sg

Please note that this fails at many levels, I could fill the page with the correct regular expression, but I have no desire.

this also fails ...
/<\w+\s+[^>]*?(?<=\s)version\s*=\s*(".+?"|'.+?')[^>]*?\s*\/?>/sg

does it so /<\w+\s+[^>]*?(?<=\s)version\s*=\s*(["'])(.+?)\1[^>]*?\s*\/?>/sg

0
source

Source: https://habr.com/ru/post/1790573/


All Articles