Get the modified HTML content after updating it with Javascript? (HtmlUnit)

I am having trouble figuring out how to get the contents of some HTML after javascript has updated it.

In particular, I am trying to get the current time from the US Navy Master Clock . It has an element h1 with ID of USNOclk , which displays the current time.

When the first page loads, this element is set to display "Loading ..." and then javascript launches and updates it to the current time using

 function showTime() { document.getElementById('USNOclk').innerHTML="Loading...<br />"; xmlHttp=GetXmlHttpObject(); if (xmlHttp==null){ document.getElementById('USNOclk').innerHTML="Sorry, browser incapatible. <BR />"; return; } refresher = 0; startResponse = new Date().getTime(); var url="http://tycho.usno.navy.mil/cgi-bin/time.pl?n="+ startResponse; xmlHttp.onreadystatechange=stateChanged; xmlHttp.open("GET",url,true); xmlHttp.send(null); } 

So the problem is that I'm not sure how to get the updated time. When I check the element, I see "Loading ..." as the content of the h1 element.

I double-checked that javascript is enabled, and I tried calling the waitForBackgroundJavaScript function in the webclient , and also hoping this would give javascript time to start the update. However, there is no success yet.

My current code is:

 import com.gargoylesoftware.htmlunit._ import com.gargoylesoftware.htmlunit.html.HtmlPage object AtomicTime { def main(args: Array[String]): Unit = { val url = "http://tycho.usno.navy.mil/what.html" val client = new WebClient(BrowserVersion.CHROME) println(client.isJavaScriptEnabled()) // returns true client.waitForBackgroundJavaScript(10000) // client.waitForBackgroundJavaScriptStartingBefore(10000) //tried this one too without success var response: HtmlPage = client.getPage(url) println(response.asText()) } } 

How can I activate javascript to update HTML?

+6
source share
2 answers

I get it!

HtmlPage objects have executeJavaScript(String) , which can be used to run the showTime script. Then, as soon as the script really started, when waitForBackgroundJavaScript becomes relevant.

The code I ended up in:

 import com.gargoylesoftware.htmlunit._ import com.gargoylesoftware.htmlunit.html.HtmlPage import com.gargoylesoftware.htmlunit.html.DomElement object AtomicTime { def main(args: Array[String]): Unit = { val url = "http://tycho.usno.navy.mil/what.html" val client = new WebClient(BrowserVersion.CHROME) var response: HtmlPage = client.getPage(url) response.executeJavaScript("showTime") printf("Current AtomicTime: %s", getUpdatedRespose(response, client)) } def getUpdatedRespose(page: HtmlPage, client: WebClient): String = { while (page.getElementById("USNOclk").asText() == "Loading...") { client.waitForBackgroundJavaScript(200) } return page.getElementById("USNOclk").asText() } } 
+6
source

Although the waitForBackgroundJavaScript method seems like a good alternative, it's worth mentioning that it is experimental. You can see this in JavaDocs that indicate:

Experimental API: may be changed in the next version and may not work perfectly yet!

Therefore, I recommend moving to a more complex approach:

 int amountOfTries = 10; while (amountOfTries > 0 && CONDITION) { amountOfTries--; synchronized (page) { page.wait(1000); } } 

Please note that the condition amountOfTries must take appropriate action if there is any problem with the request. Otherwise, you will end up in an endless loop. Be careful with that.

Then you should replace CONDITION with your actual state. In this case, it is

 page.getElementById("USNOclk").asText().equals("Loading...") 

In short, what the above code does is check that the condition becomes true every second for a maximum of 10 seconds.

Of course, the best approach would be to extract this error checking behavior into a separate method so that you can reuse the logic in different conditions.

+2
source

Source: https://habr.com/ru/post/950208/


All Articles