I am trying to use JDOM2 in order to extract the information I need from an XML document. How to get a tag in a tag?
I was only partially successful. Although I managed to use xpath to retrieve the <record> tags, an xpath request to retrieve the title, description, and other data in the record tags returned null.
I have successfully used Xpath to extract <record> tags from a document. To do this, I use the following xpath query: "// oai: record", where the namespace "oai" is the namespace I created to use xpath.
You can see the XML document that I am viewing here, and I set the sample below: http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&set=cwp&metadataPrefix=oai_dc
<record> <header> <identifier>oai:lcoa1.loc.gov:loc.pnp/cph.3a02293</identifier> <datestamp>2009-05-27T07:22:37Z</datestamp> <setSpec>cwp</setSpec> <setSpec>lcphotos</setSpec> </header> <metadata> <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:title>Jubal A. Early</dc:title> <dc:description>This record contains unverified, old data from caption card.</dc:description> <dc:date>[between 1860 and 1880]</dc:date> <dc:type>image</dc:type> <dc:type>still image</dc:type> <dc:identifier>http://hdl.loc.gov/loc.pnp/cph.3a02293</dc:identifier> <dc:language>eng</dc:language> <dc:rights>No known restrictions on publication.</dc:rights> </oai_dc:dc> </metadata> </record>
If you look in a larger document, you will see that the attribute "xmlns" is not specified in any of the tags. There is also the question of whether there are three different namespaces in the document ("none / oai", "oai_dc", "dc").
What happens is that xpath does not match anything, and the valueFirst (parent) method returns null.
Here are some of my code to extract name, date, description, etc. from a record item.
XPathFactory xpf = XPathFactory.instance(); XPathExpression<Element> xpath = xpf.compile("//dc:title", Filters.element(), null, namespaceList.toArray(new Namespace[namespaceList.size()])); Element tag = xpath.evaluateFirst(parent); if(tag != null) { return Option.fromString(tag.getText()); } return Option.none();
Any thoughts would be appreciated! Thank you