Python: ignoring namespaces in xml.etree.ElementTree?

How can I tell ElementTree to ignore namespaces in an XML file?

For example, I would prefer to request modelVersion (as in instruction 1) rather than {http://maven.apache.org/POM/4.0.0}modelVersion (as in instruction 2).

 pom=""" <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> </project> """ from xml.etree import ElementTree ElementTree.register_namespace("","http://maven.apache.org/POM/4.0.0") root = ElementTree.fromstring(pom) print 1,root.findall('modelVersion') print 2,root.findall('{http://maven.apache.org/POM/4.0.0}modelVersion') 1 [] 2 [<Element '{http://maven.apache.org/POM/4.0.0}modelVersion' at 0x1006bff10>] 
+5
source share
4 answers

There seems to be no direct path, so I just carry forward the find calls, for example.

 from xml.etree import ElementTree as ET POM = """ <project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0"> <modelVersion>4.0.0</modelVersion> </project> """ NSPS = {'foo' : "http://maven.apache.org/POM/4.0.0"} # sic! def findall(node, tag): return node.findall('foo:' + tag, NSPS) root = ET.fromstring(POM) print(map(ET.tostring, findall(root, 'modelVersion'))) 

output:

 ['<ns0:modelVersion xmlns:ns0="http://maven.apache.org/POM/4.0.0">4.0.0</ns0:modelVersion>\n'] 
0
source

This is what I'm doing right now, which makes me incredibly confident that there is a better way.

 $ cat pom.xml | tr '\n' ' ' | sed 's/<project [^>]*>/<project>/' | myprogram | sed 's/<project>/<project xmlns="http:\/\/maven.apache.org\/POM\/4.0.0" xmlns:xsi="http:\/\/www.w3.org\/2001\/XMLSchema-instance" xsi:schemaLocation="http:\/\/maven.apache.org\/POM\/4.0.0 http:\/\/maven.apache.org\/maven-v4_0_0.xsd">/' 
0
source

Instead of ignoring, another approach would be to remove the namespaces in the tree, so there is no need to β€œignore” because there are none β€” see neonagon's answer to this question (and my extension of this to include namespaces in attributes ): Python module ElementTree: how to ignore the XML file namespace to find a suitable element when using the find, find and <methods ,

0
source

Here's an equivalent solution without using a shell. Main idea:

  • translate <project junk...> to <project>
  • perform clean processing without worrying about namespace
  • translate <project> back to <project junk...>

with new code:

 pom=""" <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> </project> """ short_project="""<project>""" long_project="""<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">""" import re,sys from xml.etree import ElementTree # eliminate namespace specs pom=re.compile('<project [^>]*>').sub(short_project,pom) root = ElementTree.fromstring(pom) ElementTree.dump(root) print 1,root.findall('modelVersion') print 2,root.findall('{http://maven.apache.org/POM/4.0.0}modelVersion') mv=root.findall('modelVersion') # restore the namespace specs pom=ElementTree.tostring(root) pom=re.compile(short_project).sub(long_project,pom) 
0
source

Source: https://habr.com/ru/post/1237380/


All Articles