I get the following XML, which represents the news:
<content> Some text blalalala <h2>Small subtitle</h2> Some more text blbla <ul class="list"> <li>List item 1</li> <li>List item 2</li> </ul> <br /> Even more freakin text </content>
I know that the format is not perfect, but for now I have to accept it.
The article should look like this:
- Some texts of blalalala
- Small subtitle
- List with items
- Even more freakin text
I am parsing this XML with Jsoup. I can get the text in the <content> tag using doc.ownText() , but then I have no idea where the other material (subtitles) is located, I get only one big String .
Would it be better to use an event-based parser (I hate them :() or is it possible to do something like doc.getTextUntilTagAppears("tagName") ?
Edit: for clarification, I know that itβs hot to get the elements under the <content> , my problem is to get the text inside the <content> , it breaks every time the element breaks it.
I found out that I can get all the text in the content using .textNodes() , it works fine, but again I donβt know which node text my article belongs to (one at the top to h2, the other one at the bottom).
source share