Xerces behave differently on SUN JRE v1.5 and IBM J9 v1.5

Question

Xerces behave differently on SUN JRE v1.5 and IBM J9 v1.5

I am trying to parse some HTML using NekoHTML .

The problem is that when the code snippet below is executed in SUN JDK 1.5.0_01 , it works fine (this is when I use eclipse with sun jre). But when the same thing runs on the IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled) , then it does not work (this is when I use IBM RAD for development).

 NodeList tags = doc.getElementsByTagName("td"); for (int i = 0; i < tags.getLength(); i++) { Element elem = (Element) tags.item(i); // do something with elem }

By working perfectly, I mean that I get a list of "td" elements that I can process further. In the case of J9, I do not enter the for loop.

I am using the latest version of NekoHTML (along with Xerces banks). doc in the above code is of type org.w3.dom.Document (the org.w3.dom.Document class used is org.apache.html.dom.HTMLDocumentImpl )

Details of the IBM J9 are as follows:

 java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pwi32devifx-20070323 (ifix 117674: SR4 + 116644 + 114941 + 116110 + 114881)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Windows XP x86-32 j9vmwi3223ifx-20070323 (JIT enabled) J9VM - 20070322_12058_lHdSMR JIT - 20070109_1805ifx3_r8 GC - WASIFIX_2007) JCL - 20070131

Any idea, suggestion or workaround is appreciated. Thanks.

+1

java dom html xerces j9

Favonius Dec 21 '10 at 9:24

source share

1 answer

Alexr · Accepted Answer · 2010-12-21T10:24:24+0000

I have 2 ideas.

I just checked that xerces is part of the JRE installation, so I believe that it arrives in the classpath of your application from there. SUN and IBM are probably offering you different versions of xerces. So, as a first approach, check it out and probably try replacing what you have with IBM with a SUN version. If this helps, you have 2 options: continue working with IBM java using xerces from SUN or continue researching what is wrong with xerces from IBM.
Are there other differences between your dev and production environment? Are these the same operating systems? Is this a chance that you are using (for example) windows for development and unix for production, but your xml is written on Windows with \ r \ n as a new line? Or even more: if your XML contains Unicode characters and is written in Windows, it may contain a special (invisible) prefix that indicates that it is unicode. This prefix may cause the parser to fail.

Xerces behave differently on SUN JRE v1.5 and IBM J9 v1.5

More articles: