Xerces behave differently on SUN JRE v1.5 and IBM J9 v1.5

I am trying to parse some HTML using NekoHTML .

The problem is that when the code snippet below is executed in SUN JDK 1.5.0_01 , it works fine (this is when I use eclipse with sun jre). But when the same thing runs on the IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled) , then it does not work (this is when I use IBM RAD for development).

 NodeList tags = doc.getElementsByTagName("td"); for (int i = 0; i < tags.getLength(); i++) { Element elem = (Element) tags.item(i); // do something with elem } 

By working perfectly, I mean that I get a list of "td" elements that I can process further. In the case of J9, I do not enter the for loop.

I am using the latest version of NekoHTML (along with Xerces banks). doc in the above code is of type org.w3.dom.Document (the org.w3.dom.Document class used is org.apache.html.dom.HTMLDocumentImpl )

Details of the IBM J9 are as follows:

 java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (ifix 117674: SR4 + 116644 + 114941 + 116110 + 114881)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled) J9VM - 20070322_12058_lHdSMR JIT - 20070109_1805ifx3_r8 GC - WASIFIX_2007) JCL - 20070131 

Any idea, suggestion or workaround is appreciated. Thanks.

+1
source share
1 answer

I have 2 ideas.

  • I just checked that xerces is part of the JRE installation, so I believe that it arrives in the classpath of your application from there. SUN and IBM are probably offering you different versions of xerces. So, as a first approach, check it out and probably try replacing what you have with IBM with a SUN version. If this helps, you have 2 options: continue working with IBM java using xerces from SUN or continue researching what is wrong with xerces from IBM.
  • Are there other differences between your dev and production environment? Are these the same operating systems? Is this a chance that you are using (for example) windows for development and unix for production, but your xml is written on Windows with \ r \ n as a new line? Or even more: if your XML contains Unicode characters and is written in Windows, it may contain a special (invisible) prefix that indicates that it is unicode. This prefix may cause the parser to fail.
+1
source

Source: https://habr.com/ru/post/1334944/


All Articles