HtmlUnit does not create an HtmlPage object

I am very new to HtmlUnit and I am trying to clear a website that uses Javascript to edit code. I heard that HtmlUnit was the best way, since it returns final code using a browser without a browser.

However, as you will see, I can’t even get an idea about the HtmlPage object without getting a huge and impossible to understand exception (at least considering my almost zero experience with HtmlUnit).

Here is my code:

import com.gargoylesoftware.htmlunit.*; import com.gargoylesoftware.htmlunit.html.HtmlPage; public class Main { public static void main(String[] args) { Main scraper = new Main(); scraper.testingGargoyle(); } private void testingGargoyle() { String myUrl = "https://www.wearvr.com/#game_id=game_4"; WebClient webClient = new WebClient(); try { HtmlPage myPage = ((HtmlPage) webClient.getPage(myUrl)); } catch (FailingHttpStatusCodeException | IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } 

And this is the exception that throws:

 Apr 30, 2015 5:43:50 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Obsolete content type encountered: 'application/x-javascript'. Apr 30, 2015 5:43:50 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter runtimeError SEVERE: runtimeError: message=[The data necessary to complete this operation is not yet available.] sourceName=[https://load.sumome.com/] line=[1] lineSource=[null] lineOffset=[0] Exception in thread "main" ======= EXCEPTION START ======== EcmaError: lineNumber=[19] column=[0] lineSource=[<no source>] name=[TypeError] sourceName=[https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js] message=[TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19)] com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:847) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1096) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:395) at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:270) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751) at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:345) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:410) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:395) at Main.testingGargoyle(Main.java:19) at Main.main(Main.java:10) Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3634) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3650) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3714) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2233) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2215) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1333) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832) ... 31 more Enclosed exception: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot find function bind in object function (e, n, r) {...}. (https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js#19) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3629) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3613) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3634) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3650) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFunctionError(ScriptRuntime.java:3714) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThisHelper(ScriptRuntime.java:2233) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2215) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1333) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:19) at script.r(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16) at script.r(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:384) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16) at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:16) at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:7) at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:463) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:463) at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at script.t(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at script(https://www.wearvr.com/assets/scripts/bundle.b4038a088bb1abfcf55c.js:1) at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411) at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309) at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057) at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832) at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620) at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733) at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:1096) at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:395) at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:270) at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793) at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751) at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126) at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093) at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017) at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248) at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268) at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156) at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:345) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:410) at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:395) at Main.testingGargoyle(Main.java:19) at Main.main(Main.java:10) ======= EXCEPTION END ======== 

I told you that is huge. How can I get around this and get the final source of this page to get curettage?

Thanks in advance!

+3
source share
2 answers

Exceptions are thrown for several reasons, incorrect html, errors on the script page, resources did not find such css, script files or image files (for example, <img src="bla.gif"> <- bla.gif not found HTML404)

So, we use these options to support html-nawgating, not stopping at the first error / problem that we use:

 webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); 

You can also implement empty classes to stop htmlUnity go verbose on the css / javaScript error console using:

 webClient.setCssErrorHandler(new SilentCssErrorHandler()); webClient.setJavaScriptErrorListener(new JavaScriptErrorListener(){}); 

A small test example:

 @Test public void TestCall() throws FailingHttpStatusCodeException, MalformedURLException, IOException { WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setUseInsecureSSL(true); //ignore ssl certificate webClient.getOptions().setThrowExceptionOnScriptError(false); webClient.getOptions().setThrowExceptionOnFailingStatusCode(false); String url = "https://www.wearvr.com/#game_id=game_4"; HtmlPage myPage = webClient.getPage(url); webClient.waitForBackgroundJavaScriptStartingBefore(200); webClient.waitForBackgroundJavaScript(20000); //do stuff on page ex: myPage.getElementById("main") //myPage.asXml() <- tags and elements System.out.println(myPage.asText()); } 
+3
source

Try using a different browser, for example:

 String myUrl = "https://www.wearvr.com/#game_id=game_4"; try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) { HtmlPage myPage = ((HtmlPage) webClient.getPage(myUrl)); System.out.println(myPage.asXml()); } catch (FailingHttpStatusCodeException | IOException e) { e.printStackTrace(); } 

However, this may also be a mistake in modeling IE8.

+1
source

Source: https://habr.com/ru/post/1438327/


All Articles