I am using XOM with the following data examples:
Element root = cleanDoc.getRootElement(); //find all the bold elements, as those mark institution and clinic. Nodes nodes = root.query("//*"); <html xmlns="http://www.w3.org/1999/xhtml" xmlns:html="http://www.w3.org/1999/xhtml"> <head> <title>Patient Information</title> </head> </html>
The following element returns many elements (from real data):
but something like
Returns nothing. If I run the children of the root, the numbers seem to match, and if I print the name of the element, everything looks right.
I take the HTML, parse it with tagoup, and then create an XOM document from the resulting string. How much of this can go so terribly wrong? I feel like there is some kind of weird encoding problem, but I just don't see it. Java strings are strings, right?
source share