JDOM2 xpath finds nodes in another namespace

I am trying to use JDOM2 in order to extract the information I need from an XML document. How to get a tag in a tag?

I was only partially successful. Although I managed to use xpath to retrieve the <record> tags, an xpath request to retrieve the title, description, and other data in the record tags returned null.

I have successfully used Xpath to extract <record> tags from a document. To do this, I use the following xpath query: "// oai: record", where the namespace "oai" is the namespace I created to use xpath.

You can see the XML document that I am viewing here, and I set the sample below: http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&set=cwp&metadataPrefix=oai_dc

 <record> <header> <identifier>oai:lcoa1.loc.gov:loc.pnp/cph.3a02293</identifier> <datestamp>2009-05-27T07:22:37Z</datestamp> <setSpec>cwp</setSpec> <setSpec>lcphotos</setSpec> </header> <metadata> <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Jubal A. Early</dc:title> <dc:description>This record contains unverified, old data from caption card.</dc:description> <dc:date>[between 1860 and 1880]</dc:date> <dc:type>image</dc:type> <dc:type>still image</dc:type> <dc:identifier>http://hdl.loc.gov/loc.pnp/cph.3a02293</dc:identifier> <dc:language>eng</dc:language> <dc:rights>No known restrictions on publication.</dc:rights> </oai_dc:dc> </metadata> </record> 

If you look in a larger document, you will see that the attribute "xmlns" is not specified in any of the tags. There is also the question of whether there are three different namespaces in the document ("none / oai", "oai_dc", "dc").

What happens is that xpath does not match anything, and the valueFirst (parent) method returns null.

Here are some of my code to extract name, date, description, etc. from a record item.

  XPathFactory xpf = XPathFactory.instance(); XPathExpression<Element> xpath = xpf.compile("//dc:title", Filters.element(), null, namespaceList.toArray(new Namespace[namespaceList.size()])); Element tag = xpath.evaluateFirst(parent); if(tag != null) { return Option.fromString(tag.getText()); } return Option.none(); 

Any thoughts would be appreciated! Thank you

+5
source share
1 answer

In your XML, the dc prefix is โ€‹โ€‹mapped to the uri namespace http://purl.org/dc/elements/1.1/ , so make sure you specify the namespace prefix mapping to be used in XPath accordingly. This is the part where the namespace prefix is โ€‹โ€‹declared in your XML:

 <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> 

The XML parser only sees the namespace explicitly declared in XML, it will not try to open the namespace URL, since the namespace is not necessarily a URL. For example, the following URI that I found in this recent SO question is also suitable for a namespace: uuid:ebfd9-45-48-a9eb-42d

+2
source

Source: https://habr.com/ru/post/1238045/


All Articles