How can I ignore DTD validation, but retain Doctype when writing an XML file?

I am working on a system that should be able to read any (or at least any well-formed) XML file, manipulate several nodes and write them back to the same file. I want my code to be as general as possible, and I don't want

  • hard-coded links to Schema / Doctype information anywhere in my code. The information about doctype is in the original document, I want to save exactly this information about doping and not provide it again from my code. If the document does not have a DocType, I will not add it. I do not care about the form or content of these files at all, except for my few nodes.
  • custom EntityResolvers or StreamFilters to omit or otherwise manipulate the source information (it is already a pity that the namespace information seems somehow inaccessible from the document file where it is declared, but I can control using more complex XPaths)
  • DTD verification. I do not have reference DTDs, I do not want to include them, and Node manipulation is quite possible without knowing about them.

The goal is to keep the source file completely unchanged, with the exception of the modified nodes that are extracted through XPath. I would like to get away with the standard javax.xml file.

My progress:

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    factory.setAttribute("http://xml.org/sax/features/namespaces", true);
    factory.setAttribute("http://xml.org/sax/features/validation", false);
    factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
    factory.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

    factory.setNamespaceAware(true);
    factory.setIgnoringElementContentWhitespace(false);
    factory.setIgnoringComments(false);
    factory.setValidating(false);
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document document = builder.parse(new InputSource(inStream));

This loads the XML source into org.w3c.dom.Document successfully, ignoring DTD validation. I can do my replacements and then I use

    Source source = new DOMSource(document);
    Result result = new StreamResult(getOutputStream(getPath()));

    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    xformer.transform(source, result);

. . Doctype , , . , DeferredDoctypeImpl [log4j: configuration: null] Document, - , . , , ( ):

<? xml version = "1.0" encoding = "UTF-8"? >

<! DOCTYPE log4j: SYSTEM "log4j.dtd" >

< log4j: xmlns: log4j = "http://jakarta.apache.org/log4j/" debug = "false" >

[...]

, (?) JAR . , .

+3
3

, , XMLSerializer Transformer...

+2

, LSSerializer, JDK:

    private void writeDocument(Document doc, String filename)
            throws IOException {
        Writer writer = null;
        try {
            /*
             * Could extract "ls" to an instance attribute, so it can be reused.
             */
            DOMImplementationLS ls = (DOMImplementationLS) 
                    DOMImplementationRegistry.newInstance().
                            getDOMImplementation("LS");
            writer = new OutputStreamWriter(new FileOutputStream(filename));
            LSOutput lsout = ls.createLSOutput();
            lsout.setCharacterStream(writer);
            /*
             * If "doc" has been constructed by parsing an XML document, we
             * should keep its encoding when serializing it; if it has been
             * constructed in memory, its encoding has to be decided by the
             * client code.
             */
            lsout.setEncoding(doc.getXmlEncoding());
            LSSerializer serializer = ls.createLSSerializer();
            serializer.write(doc, lsout);
        } catch (Exception e) {
            throw new IOException(e);
        } finally {
            if (writer != null) writer.close();
        }
    }

:

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import org.w3c.dom.Document;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;

, , , , -.

0

LSSerializer Doctype. , , , . scala, java,

import com.sun.org.apache.xml.internal.serialize.{OutputFormat, XMLSerializer}
 def transformXML(root: Element, file: String): Unit = {
    val doc = root.getOwnerDocument
    val format = new OutputFormat(doc)
    format.setIndenting(true)
    val writer = new OutputStreamWriter(new FileOutputStream(new File(file)))
    val serializer = new XMLSerializer(writer, format)
    serializer.serialize(doc)

  }
0

Source: https://habr.com/ru/post/1703750/


All Articles