I am implementing a web robot that should get all the links from the page and select the ones that I need. I got all this, except that I ran into a problem where the link is inside the "table" or "span" tag. Here is my code snippet:
Document doc = Jsoup.connect(url) .timeout(TIMEOUT * 1000) .get(); Elements elts = doc.getElementsByTag("a");
And here is an example HTML:
<table> <tr><td><a href="www.example.com"></a></td></tr> </table>
My code will not receive such links. Using doc.select doesn't help either. My question is: how to get all the links from the page?
EDIT: I think I know where the problem is. The page I am having problems with is very poorly written, the HTML validator produces a huge amount of errors. Could this cause problems?
source share