I need to parse several thousand XML documents to make sure that some of them contain a specific construct. The problem is that some of the documents do not contain well-formed XML.
The main idea was to use fn:collection()and search inside returned nodes. But this only works if all the documents in the collection are well-formed.
Is it possible to do something similar, but only parse well-formed documents?
This is my XSLT, simplified, which works if all the documents are $dirwell-formed:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:variable name="dir" as="xs:string">file:/c:/path/to/files/</xsl:variable>
<xsl:variable name="files" select="concat($dir, '?select=*.xml')" as="xs:string"/>
<xsl:template match="/">
<xsl:variable name="docs" select="collection($files)"/>
<xsl:variable name="names" select="
for $i in $docs return
distinct-values($i//*[exists(@an-attribute-to-find)]/local-name())"/>
<xsl:value-of select="distinct-values($names)" separator="
"/>
</xsl:template>
</xsl:stylesheet>
- ? , ?