I use the HtmlUnit library for Java to programmatically manipulate websites. I cannot find a working solution for my problem: how to determine that all AJAX calls are finished and return a fully loaded web page? Here is what I tried:
First, I create an instance of WebClient and call my method processWebPage(String url, WebClient webClient)
WebClient webClient = null; try { webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient.setThrowExceptionOnScriptError(false); webClient.setThrowExceptionOnFailingStatusCode(false); webClient.setJavaScriptEnabled(true); webClient.setAjaxController(new NicelyResynchronizingAjaxController()); } catch (Exception e) { System.out.println("Error"); } HtmlPage currentPage = processWebPage("http://www.example.com", webClient);
And here is my method that should return a fully loaded web page:
private static HtmlPage processWebPage(String url, WebClient webClient) { HtmlPage page = null; try { page = webClient.getPage(url); } catch (Exception e) { System.out.println("Get page error"); } int z = webClient.waitForBackgroundJavaScript(1000); int counter = 1000; while (z > 0) { counter += 1000; z = webClient.waitForBackgroundJavaScript(counter); if (z == 0) { break; } synchronized (page) { System.out.println("wait"); try { page.wait(500); } catch (InterruptedException e) { e.printStackTrace(); } } } System.out.println(page.asXml()); return page; }
This z variable should return 0 if JavaScript is not loaded.
Any thoughts? Thanks in advance.
EDIT: I found a partially working solution for my problem, but in this case I should know what the answer page looks like. For example, if a fully loaded page contains the text "full", my solution would be:
HtmlPage page = null; int PAGE_RETRY = 10; try { page = webClient.getPage("http://www.example.com"); } catch (Exception e) { e.printStackTrace(); } for (int i = 0; !page.asXml().contains("complete") && i < PAGE_RETRY; i++) { try { Thread.sleep(1000 * (i + 1)); page = webClient.getPage("http://www.example.com"); } catch (Exception e) { e.printStackTrace(); } }
But what would be the solution if I don't know what a fully loaded page looks like?
source share