XML parsing with Jsoup

Question

XML parsing with Jsoup

I get the following XML, which represents the news:

<content> Some text blalalala <h2>Small subtitle</h2> Some more text blbla <ul class="list"> <li>List item 1</li> <li>List item 2</li> </ul> <br /> Even more freakin text </content>

I know that the format is not perfect, but for now I have to accept it.

The article should look like this:

Some texts of blalalala
Small subtitle
List with items
Even more freakin text

I am parsing this XML with Jsoup. I can get the text in the <content> tag using doc.ownText() , but then I have no idea where the other material (subtitles) is located, I get only one big String .

Would it be better to use an event-based parser (I hate them :() or is it possible to do something like doc.getTextUntilTagAppears("tagName") ?

Edit: for clarification, I know that it’s hot to get the elements under the <content> , my problem is to get the text inside the <content> , it breaks every time the element breaks it.

I found out that I can get all the text in the content using .textNodes() , it works fine, but again I don’t know which node text my article belongs to (one at the top to h2, the other one at the bottom).

+4

java xml jsoup

asco Jul 11 '13 at 10:43

source share

2 answers

Jsoup has a fantastic selector-based syntax. Look here

If you want the subtitle

 Document doc = Jsoup.parse("path-to-your-xml"); // get the document node

You know that the subtitles are in the h2 element

 Element subtitle = doc.select("h2").first(); // first h2 element that appears

And if you like to have a list:

 Elements listItems = doc.select("ul.list > li"); for(Element item: listItems) System.out.println(item.text()); // print list items one after another

+8

zEro Jul 11 '13 at 11:12

source share

asco · Accepted Answer · 2013-07-11T12:27:50+0000

The error I made went through XML Elements that do not include TextNodes . When I go through Node with Node, I can verify that Node is an Element or TextNode , so I can handle them accordingly.

XML parsing with Jsoup

More articles: