Paste doctype into an XML document (Java / SAX)

Imagine that you have an XML document, and imagine that you have a DTD, but the document itself does not point to DOCTYPE ... How would you insert a DOCTYPE declaration, preferably by specifying it in the parser (similar to how you can set the schema for document to be parsed) or by inserting the necessary SAX events through XMLFilter or the like?

I found many references to EntityResolver , but this is what EntityResolver called when a DOCTYPE found during parsing, and it used to point to a local DTD file. EntityResolver2 seems to have what I'm looking for, but I haven't found any usage examples.

This is the closest I have so far: (Groovy code, but close enough so you can understand it ...)

 import org.xml.sax.* import org.xml.sax.ext.* import org.xml.sax.helpers.* class XmlFilter extends XMLFilterImpl { public XmlFilter( XMLReader reader ) { super(reader) } @Override public void startDocument() { super.startDocument() super.resolveEntity( null, 'file:///./entity.dtd') println "filter startDocument" } } class MyHandler extends DefaultHandler2 { public InputSource resolveEntity(String name, String publicId, String baseURI, String systemId) { println "entity: $name, $publicId, $baseURI, $systemId" return new InputSource(new StringReader('<!ENTITY asdf "&#161;">')) } } def handler = new MyHandler() def parser = XMLReaderFactory.createXMLReader() parser.setFeature 'http://xml.org/sax/features/use-entity-resolver2', true def filter = new XmlFilter( parser ) filter.setContentHandler( handler ) filter.setEntityResolver( handler ) filter.parse( new InputSource(new StringReader('''<?xml version="1.0" ?> <test>one &asdf; two! &nbsp; &iexcl;&pound;&cent;</test>''')) ); 

I see resolveEntity but still clicked

org.xml.sax.SAXParseException: The "asdf" object is referenced but not declared.
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse (AbstractSAXParser.java:1231)
at org.xml.sax.helpers.XMLFilterImpl.parse (XMLFilterImpl.java data33)

I assume that this is due to the fact that there is no way to add SAX events that the parser knows about, I can only add events through a filter located upstream of the analyzer, which is transmitted along with the ContentHandler. Therefore, the document must be valid in XMLReader. How to get around this? I know that I can modify the source stream to add doctype or maybe do a conversion to install DTD ... Any other options?

+4
source share
2 answers

You can try DoctypeChanger , which modifies the source stream as you expected:

DoctypeChanger is a Java class that allows you to add, modify, or delete a DOCTYPE declaration from a byte stream, as it is passed to the XML parser.

 InputStream in = ... // get your XML InputStream DOCTYPEChangerStream changer = new DOCTYPEChangerStream(in); changer.setGenerator( new DoctypeGenerator() { public Doctype generate(Doctype old) { return new DoctypeImpl("rootElement", "pubId", "sysId", "internalSubset"); } } ); // .. and pass it on to the parser. 
+1
source

I would use the xslt stylesheet to translate the identity and use the xsl:output element along with the doctype-system attribute (and doctype-public if I wanted to add a public identifier).

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output doctype-system="test.dtd"/> <xsl:template match="node()|@*"> <xsl:copy> <xsl:apply-templates select="node()|@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> 
+1
source

Source: https://habr.com/ru/post/1306984/


All Articles