How can I avoid special characters using the DOM

Recently, this problem has been distorting me a lot, and I can’t find a possible solution.

I am dealing with a web server that receives an XML document for processing. Server parser has problems with &, ',', <,>. I know this is bad, I have not implemented an XML parser on this server. But before I wait for the update, I need to get around.

Now, before uploading my XML document to this server, I need to parse it and avoid special xml characters. I am currently using the DOM. The problem is that if I repeat TEXT_NODES and replace all special characters with my escaped versions, when I save this document,

for d'exi get d&amp;apos;exbut i needd&apos;ex

This makes sense as the DOM eludes "&". But obviously this is not what I need.

So, if the DOM is already capable of escaping from "&"to "&amp;", how can I convince other characters like "to &quot;?

If it cannot, how can I save the already processed and shielded texts in the nodes in it, without forcing them to re-hide them when saving?

Here's how I avoid the special characters that I used the apache StringEscapeUtils class:

public String xMLTransform() throws Exception
      {

         String xmlfile = FileUtils.readFileToString(new File(filepath));

         DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
         DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
         Document doc = docBuilder.parse(new InputSource(new StringReader(xmlfile.trim().replaceFirst("^([\\W]+)<", "<"))));

       NodeList nodeList = doc.getElementsByTagName("*");

       for (int i = 0; i < nodeList.getLength(); i++) {
          Node currentNode = nodeList.item(i);
          if (currentNode.getNodeType() == Node.ELEMENT_NODE) {
              Node child = currentNode.getFirstChild();
              while(child != null) {
                  if (child.getNodeType() == Node.TEXT_NODE) {                   
                    child.setNodeValue(StringEscapeUtils.escapeXml10(child.getNodeValue()));
//Escaping works here. But when saving the final document, the "&" used in escaping gets escaped as well by DOM.


                  }
                  child = child.getNextSibling();
              }
          }
      }

         TransformerFactory transformerFactory = TransformerFactory.newInstance();

       Transformer transformer = transformerFactory.newTransformer();
         DOMSource source = new DOMSource(doc);
         StringWriter writer = new StringWriter();
         StreamResult result = new StreamResult(writer);
         transformer.transform(source, result);


         FileOutputStream fop = null;
         File file;

         file = File.createTempFile("escapedXML"+UUID.randomUUID(), ".xml");

         fop = new FileOutputStream(file);

         String xmlString = writer.toString();
         byte[] contentInBytes = xmlString.getBytes();

         fop.write(contentInBytes);
         fop.flush();
         fop.close();

      return file.getPath();


      }
+4
source share
4 answers

, , , XSLT, HTML-.

, xslt, , , , . Java :

@Test
    public void testXSLTTransforms () throws Exception {
        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
        Document doc = docBuilder.newDocument();
        Element el = doc.createElement("Container");
        doc.appendChild(el);


        Text e = doc.createTextNode("Character");
        el.appendChild(e);
        //e.setNodeValue("\'");
        //e.setNodeValue("\"");

        e.setNodeValue("&");



        TransformerFactory transformerFactory = TransformerFactory.newInstance();       
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");        
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");


        DOMSource source = new DOMSource(doc);
        StreamResult result = new StreamResult(System.out);
        //This prints the original document to the command line.
        transformer.transform(source, result);

        InputStream xsltStream =  getClass().getResourceAsStream("/characterswap.xslt");
            Source xslt = new StreamSource(xsltStream);
            transformer = transformerFactory.newTransformer(xslt);
            //This one is the one you'd pipe to a file
            transformer.transform(source, result);
    }

XSLT, , , :

characterswap.xslt

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
 <xsl:text> &#xa;  Original VALUE :  </xsl:text>
     <xsl:copy-of select="."/>
     <xsl:text> &#xa;  OUTPUT ESCAPING DISABLED :  </xsl:text>
      <xsl:value-of select="." disable-output-escaping="yes"/>
      <xsl:text> &#xa;  OUTPUT ESCAPING ENABLED :  </xsl:text>
      <xsl:value-of select="." disable-output-escaping="no"/>
 </xsl:template>

</xsl:stylesheet>

:

<?xml version="1.0" encoding="UTF-8"?>
<Container>&amp;</Container>

  Original VALUE :  <Container>&amp;</Container> 
  OUTPUT ESCAPING DISABLED :  & 
  OUTPUT ESCAPING ENABLED :  &amp;

node XSLT . , , .

XSLT .

XSLT, , .

.


, XSLT. , xml10 html-.

, node:

if (child.getNodeType() == Node.TEXT_NODE) {
    child.setNodeValue(StringEscapeUtils.escapeXml10(child.getNodeValue()));
}

, HTML:

if (child.getNodeType() == Node.TEXT_NODE) {
    //Capture the current node value
    String nodeValue = child.getNodeValue();
    //Decode for XML10 to remove existing escapes
    String decodedNode = StringEscapeUtils.unescapeXml10(nodeValue);
    //Then Re-encode for HTML (3/4/5)
    String fullyEncodedHTML = StringEscapeUtils.escapeHtml3(decodedNode);
    //String fullyEncodedHTML = StringEscapeUtils.escapeHtml4(decodedNode);
    //String fullyEncodedHTML = StringEscapeUtils.escapeHtml5(decodedNode);

    //Then place the fully-encoded HTML back to the node
    child.setNodeValue(fullyEncodedHTML);
}

, xml HTML .

XSLT (), .

, , XSLT. , / String, , node () .

, .

, XSLT, xslt . , .

+3

, , -

( , Java)

String newSearch = search.replaceAll("(?=[]\\[+&|!(){}^\"~*?:\\\\-])", "\\\\");

whacky regex - " " - , char - - .

, , a] ( , ).

\\\\ , \( java, )

:

public static void main(String[] args) { String search = "code:xy"; String newSearch = search.replaceAll("(?=[]\\[+&|!(){}^\"~*?:\\\\-])", "\\\\"); System.out.println(newSearch); }

:

code\:xy

+1

( XML URL- , & lt; & gt; $amp; etc?).

, XML / .

, , . XML "". DOM XML , XML HTML. IOUtils StringUtils . !

+1

Source: https://habr.com/ru/post/1648474/


All Articles