Htmlunit: return a fully loaded page

I use the HtmlUnit library for Java to programmatically manipulate websites. I cannot find a working solution for my problem: how to determine that all AJAX calls are finished and return a fully loaded web page? Here is what I tried:

First, I create an instance of WebClient and call my method processWebPage(String url, WebClient webClient)

 WebClient webClient = null; try { webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient.setThrowExceptionOnScriptError(false); webClient.setThrowExceptionOnFailingStatusCode(false); webClient.setJavaScriptEnabled(true); webClient.setAjaxController(new NicelyResynchronizingAjaxController()); } catch (Exception e) { System.out.println("Error"); } HtmlPage currentPage = processWebPage("http://www.example.com", webClient); 

And here is my method that should return a fully loaded web page:

 private static HtmlPage processWebPage(String url, WebClient webClient) { HtmlPage page = null; try { page = webClient.getPage(url); } catch (Exception e) { System.out.println("Get page error"); } int z = webClient.waitForBackgroundJavaScript(1000); int counter = 1000; while (z > 0) { counter += 1000; z = webClient.waitForBackgroundJavaScript(counter); if (z == 0) { break; } synchronized (page) { System.out.println("wait"); try { page.wait(500); } catch (InterruptedException e) { e.printStackTrace(); } } } System.out.println(page.asXml()); return page; } 

This z variable should return 0 if JavaScript is not loaded.

Any thoughts? Thanks in advance.

EDIT: I found a partially working solution for my problem, but in this case I should know what the answer page looks like. For example, if a fully loaded page contains the text "full", my solution would be:

 HtmlPage page = null; int PAGE_RETRY = 10; try { page = webClient.getPage("http://www.example.com"); } catch (Exception e) { e.printStackTrace(); } for (int i = 0; !page.asXml().contains("complete") && i < PAGE_RETRY; i++) { try { Thread.sleep(1000 * (i + 1)); page = webClient.getPage("http://www.example.com"); } catch (Exception e) { e.printStackTrace(); } } 

But what would be the solution if I don't know what a fully loaded page looks like?

+6
source share
1 answer

Try the following:

 HtmlPage page = null; try { page = webClient.getPage(url); } catch (Exception e) { System.out.println("Get page error"); } JavaScriptJobManager manager = page.getEnclosingWindow().getJobManager(); while (manager.getJobCount() > 0) { Thread.sleep(1000); } System.out.println(page.asXml()); return page; 
+6
source

Source: https://habr.com/ru/post/946653/


All Articles