According to https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot using HtmlUnit (2.13) I am trying to create a snapshot for a web page using AngularJS (1.2.1),
My Java code is:
WebClient webClient = new WebClient(); webClient.setAjaxController(new NicelyResynchronizingAjaxController()); webClient.setCssErrorHandler(new SilentCssErrorHandler()); webClient.getOptions().setCssEnabled(true); webClient.getOptions().setRedirectEnabled(false); webClient.getOptions().setAppletEnabled(false); webClient.getOptions().setJavaScriptEnabled(true); webClient.getOptions().setPopupBlockerEnabled(true); webClient.getOptions().setTimeout(10000); webClient.getOptions().setThrowExceptionOnFailingStatusCode(true); webClient.getOptions().setThrowExceptionOnScriptError(true); webClient.getOptions().setPrintContentOnFailingStatusCode(true); HtmlPage page = webClient.getPage(new WebRequest(new URL("..."), HttpMethod.GET)); webClient.waitForBackgroundJavaScript(5000); String result = page.asXml();
Although webClient.getPage(...) does not throw any exceptions, the result string still contains "unvalued angular expressions," such as
<div> {{name}} </div>
I know http://htmlunit.10904.n7.nabble.com/htmlunit-to-scrape-angularjs-td29931.html#a30075 , but the recommendation given there does not work either.
Of course, the same GET request works without exception in all current browsers.
Any ideas / experience on how to get HtmlUnit to work with AngularJS?
Update:
I created an HTMLUnit error report. At the moment, I have switched my implementation to PhantomJS. Perhaps this piece of code helps others with a similar problem:
System.setProperty("phantomjs.binary.path", "phantomjs.exe"); DesiredCapabilities caps = new DesiredCapabilities(); caps.setJavascriptEnabled(true); caps.setCapability("takesScreenshot", false); PhantomJSDriver driver = new PhantomJSDriver(caps); driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS); driver.get(new URL("...")); String result = driver.getPageSource();
Update2: I stopped rendering my pages manually, as Google crawler now makes angular sites
cnmuc source share