Extremely slow XSLT to Java conversion

I am trying to convert an XML document using XSLT. As input, I have the source code www.wordpress.org XHTML, and XSLT is a fictitious example of getting the site title (in fact, it could not do anything - it does not change anything).

Each conversion or API I use takes about 2 minutes to convert! If you look at the source of wordpress.org, you will notice that these are just 183 lines of code. Since I googled, this is probably related to building the DOM tree. No matter how simple it is in XSLT, it is always 2 minutes, so it confirms the idea that this is related to building the DOM, but in any case it will not take 2 minutes, in my opinion.

Here is a sample code (nothing special):

TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = null; try { transformer = tFactory.newTransformer( new StreamSource("/home/pd/XSLT/transf.xslt")); } catch (TransformerConfigurationException e) { e.printStackTrace(); } ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); System.out.println("START"); try { transformer.transform(new SAXSource(new InputSource( new FileInputStream("/home/pd/XSLT/wordpress.xml"))), new StreamResult(outputStream)); } catch (TransformerException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } System.out.println("STOP"); System.out.println(new String(outputStream.toByteArray())); 

This is between START and STOP, where java "pause" for 2 minutes. If I look at processor or memory usage, nothing increases. It looks like the JVM has stopped ...

Do you have experience converting XML that is longer than 50 (this is a random number;))? Since I'm reading XSLT, you always need to build a DOM tree in order to do its job. Fast conversion is crucial to me.

Thanks in advance, Peter

+4
source share
4 answers

Does the HTML sample file use a namespace? If so, your XML parser may try to retrieve the content (possibly a schema) from the namespace URI. It is likely that if each run takes exactly two minutes, it is probably one or more TCP timeouts.

You can verify this by specifying the time required to instantiate the InputSource object (where WordPress XML is actually parsed), since this is most likely a string that causes a delay. After viewing the sample file that you submitted, it includes the declared namespace ( xmlns="http://www.w3.org/1999/xhtml" ).

To get around this, you can implement your own EntityResolver , which essentially disables URL-based resolution. You may need to use the DOM - see DocumentBuilder setEntityResolver .

Here's a sample using the DOM and disabling permission (note - this is not verified):

 try { DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbFactory.newDocumentBuilder(); db.setEntityResolver(new EntityResolver() { @Override public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException { return null; // Never resolve any IDs } }); System.out.println("BUILDING DOM"); Document doc = db.parse(new FileInputStream("/home/pd/XSLT/wordpress.xml")); ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = tFactory.newTransformer( new StreamSource("/home/pd/XSLT/transf.xslt")); System.out.println("RUNNING TRANSFORM"); transformer.transform( new DOMSource(doc.getDocumentElement()), new StreamResult(outputStream)); System.out.println("TRANSFORMED CONTENTS BELOW"); System.out.println(outputStream.toString()); } catch (Exception e) { e.printStackTrace(); } 

If you want to use SAX, you will need to use SAXSource with an XMLReader that uses your custom converter.

+9
source

The comments that posted that the answer is probably in EntityResolver are probably true. However, the solution may be not just not to download the schemes, but rather to load them from the local file system.

So you can do something like this

  db.setEntityResolver(new EntityResolver() { @Override public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException { try { FileInputStream fis = new FileInputStream(new File("classpath:xsd/" + systemId)); InputSource is = new InputSource(fis); return is } catch (FileNotFoundException ex) { logger.error("File Not found", ex); return null; } } }); 
+2
source

Most likely, the problem is not the transfomer.transform call. Most likely you are doing something in your xslt that is forever. My suggestion would be to use a tool like Oxygen or XML Spy to profile XSLT and figure out which templates take the most time. Once you determine this, you can start optimizing the template.

+1
source

If you are debugging your code on an Android device, make sure you try it without using eclipse connected to the process. When I debugged my applications, xslt conversions took 8 seconds, where the same process took a tenth of a second on ios in native code. After I ran the code without attachment to eclipse, the process took a comparable amount of time for the c-counterpart of c.

0
source

Source: https://habr.com/ru/post/1337057/


All Articles