HTML parsing in Android

Question

HTML parsing in Android

I am trying to parse HTML for specific data, but I have problems with returned characters, at least I think the problem is. I use a simple substring method to split the HTML, as I know in advance what I'm looking for.

Here is my parse method:

public static void parse(String response, String[] hashItem, String[][] startEnd) throws Exception
{

    for (i = 0; i < hashItem.length; i++)
    {
        part = response.substring(response.indexOf(startEnd[i][0]) + startEnd[i][0].length());
        value = part.substring(0, part.indexOf(startEnd[i][1]));
        DATABASE.setHash(hashItem[i], value);
    }
}

Here is an example HTML that gives me problems

<table cellspacing=0 cellpadding=2 class=smallfont>
<tr onclick="lu();" onmouseover="style.cursor='hand'">
<td class=bodybox nowrap>&nbsp;     21,773,177,147 $&nbsp;</td><td></td>
<td class=bodybox nowrap>&nbsp;        629,991,926 F&nbsp;</td><td></td>
<td class=bodybox nowrap>&nbsp;             24,537 P&nbsp;</td><td></td>
<td class=bodybox nowrap>&nbsp;                  0 T&nbsp;</td>
<td></td><td class=bodybox nowrap>&nbsp;RT&nbsp;</td>

There are hidden return characters, but when I try to add them to the string I'm trying to use, this will not work, if at all. Is there a way, or perhaps a better way to remove hidden characters from HTML, to make it easier to parse? Any help is much appreciated, as always.

+3

java android html

Alejandro Huerta Sep 2 '10 at 8:33

source share

4 answers

, Jsoup:

, .

Document doc = Jsoup.connect("http://jsoup.org").get();

Elements tds = doc.select("td.bodybox");

for (Element td : tds) {
  String tdText = td.text();
}