Read the hidden quote as hidden quote from xml

I load the xml file into the DOM model and parse it.

Code for this:

public class MyTest {
public static void main(String[] args) {        
    Document doc = XMLUtils.fileToDom("MyTest.xml");//Loads xml data to DOM
    Element rootElement = doc.getDocumentElement();
    NodeList nodes = rootElement.getChildNodes();
    Node child1 = nodes.item(1);
    Node child2 = nodes.item(3);
    String str1 = child1.getTextContent();
    String str2 = child2.getTextContent();      
    if(str1 != null){
        System.out.println(str1.equals(str2));
    }
    System.out.println();
    System.out.println(str1);
    System.out.println(str2);
}   

}

MyTest.xml

<tests>
   <test name="1">ff1 &quot;</test>
   <test name="2">ff1 "</test>
</tests>

Result:

true

ff1 "
ff1 "

Desired Result:

false

ff1 &quot;
ff1 "

So, I need to distinguish between these two cases: when the quote is escaped and not.

Please, help.

Thanks in advance.

PS Code for XMLUtils # fileToDom (String filePath), a fragment from the XMLUtils class:

static {
    DocumentBuilderFactory dFactory = DocumentBuilderFactory.newInstance();
    dFactory.setNamespaceAware(false);
    dFactory.setValidating(false);
    try {
        docNonValidatingBuilder = dFactory.newDocumentBuilder();
    } catch (ParserConfigurationException e) {
    }
}

public static DocumentBuilder getNonValidatingBuilder() {
    return docNonValidatingBuilder;
}

public static Document fileToDom(String filePath) {

    Document doc = getNonValidatingBuilder().newDocument();
    File f = new File(filePath);
    if(!f.exists())
        return doc;

    try {
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        DOMResult result = new DOMResult(doc);
        StreamSource source = new StreamSource(f);
        transformer.transform(source, result);
    } catch (Exception e) {
        return doc;
    }

    return doc;

}
+1
source share
3 answers

I look at the apache xerces source code and offer my solution (but this is a monkey patch). I wrote a simple class

package a;
import java.io.IOException;
import org.apache.xerces.impl.XMLDocumentScannerImpl;
import org.apache.xerces.parsers.NonValidatingConfiguration;
import org.apache.xerces.xni.XMLString;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLComponent;

public class MyConfig extends NonValidatingConfiguration {

    private MyScanner myScanner;

    @Override
    @SuppressWarnings("unchecked")
    protected void configurePipeline() {
        if (myScanner == null) {
            myScanner = new MyScanner();
            addComponent((XMLComponent) myScanner);
        }
        super.fProperties.put(DOCUMENT_SCANNER, myScanner);
        super.fScanner = myScanner;
        super.fScanner.setDocumentHandler(this.fDocumentHandler);
        super.fLastComponent = fScanner;
    }

    private static class MyScanner extends XMLDocumentScannerImpl {

        @Override
        protected void scanEntityReference() throws IOException, XNIException {
            // name
            String name = super.fEntityScanner.scanName();
            if (name == null) {
                reportFatalError("NameRequiredInReference", null);
                return;
            }

            super.fDocumentHandler.characters(new XMLString(("&" + name + ";")
                .toCharArray(), 0, name.length() + 2), null);

            // end
            if (!super.fEntityScanner.skipChar(';')) {
                reportFatalError("SemicolonRequiredInReference",
                        new Object[] { name });
            }
            fMarkupDepth--;
        }
    }

}

Before starting parsing, you only need to add the following line to the main method

System.setProperty(
            "org.apache.xerces.xni.parser.XMLParserConfiguration",
            "a.MyConfig");

And you expect the result:

false

ff1 &quot;
ff1 "
+1
source

, TEXT_NODE getNodeValue ( , NULL):

public static String getRawContent(Node n) {
  if (n == null) {
      return null;
  }

  Node n1 = getChild(n, Node.TEXT_NODE);

  if (n1 == null) {
      return null;
  }

  return n1.getNodeValue();
}

, : http://www.java2s.com/Code/Java/XML/Gettherawtextcontentofanodeornullifthereisnotext.htm

0

There is no way to do this for internal objects. XML does not support this concept. Internal objects are just another way of writing the same PSVI content to text, they are not distinctive.

0
source

Source: https://habr.com/ru/post/1740835/


All Articles