I have a GWT application that I am trying to index.
I use HtmlUnit to get the generated HTML content:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6); HtmlPage refDesing = webClient.getPage("http://localhost:8080/MyGWTApp/#page2"); FileOutputStream fos1 = new FileOutputStream("D:\\work\\out\\page2.html"); fos1.write(refDesing.asXml().getBytes()); fos1.close();
But I get the following error, and the page returns approximately empty!
Dec 22, 2010 6:16:25 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify WARNING: Expected content type of 'application/javascript' or 'application/ecmascript' for remotely loaded JavaScript element at 'http://xxxxxxxxxxxx/xxxxxxxx/xxxxxxxx/xxxxxxxxxx.nocache.js', but got 'application/x-javascript'. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [485:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [485:29] Error in style rule. Invalid token "\n". Was expecting one of: "}", ";". Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: null [485:29] Ignoring the following declarations in this rule. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [518:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [518:29] Error in style rule. Invalid token "\n ". Was expecting one of: "}", ";". Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: null [518:29] Ignoring the following declarations in this rule. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [541:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [541:29] Error in style rule. Invalid token "\n ". Was expecting one of: "}", ";". Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: null [541:29] Ignoring the following declarations in this rule. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [951:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [951:29] Error in style rule. Invalid token "\n". Was expecting one of: "}", ";". Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: null [951:29] Ignoring the following declarations in this rule. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [977:24] Error in expression. Invalid token "=". Was expecting one of: <S>, <COMMA>, "/", <PLUS>, "-", <HASH>, <STRING>, ")", <URI>, "inherit", <EMS>, <EXS>, <LENGTH_PX>, <LENGTH_CM>, <LENGTH_MM>, <LENGTH_IN>, <LENGTH_PT>, <LENGTH_PC>, <ANGLE_DEG>, <ANGLE_RAD>, <ANGLE_GRAD>, <TIME_MS>, <TIME_S>, <FREQ_HZ>, <FREQ_KHZ>, <DIMENSION>, <PERCENTAGE>, <NUMBER>, <FUNCTION>, <IDENT>. Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler error WARNING: CSS error: null [977:29] Error in style rule. Invalid token "\n". Was expecting one of: "}", ";". Dec 22, 2010 6:16:27 PM com.gargoylesoftware.htmlunit.DefaultCssErrorHandler warning WARNING: CSS warning: null [977:29] Ignoring the following declarations in this rule.
EDIT:
What I mean by roughly empty is a snapshot of the returned HTML:
Please note that HtmlUnit does not return all the data that is displayed on the original page (which original was received from the database). Also what? mean? I do not think this means that any coding error causes all words to be clear ASCII characters.
<td align="center" style="vertical-align: top;"> <table class="refDesignGrid" cellspacing="5"> <colgroup> <col/> </colgroup> <tbody align="left"> <tr> <td align="left" style="vertical-align: top;"> <table cellpadding="0" class="categoryItem" cellspacing="0"> <tbody align="left"> <tr> <td align="left" style="vertical-align: top;"> <div class="header4"> C++ </div> </td> </tr> </tbody> </table> </td> <td align="left" style="vertical-align: top;"> <table cellpadding="0" class="categoryItem" cellspacing="0"> <tbody align="left"> <tr> <td align="left" style="vertical-align: top;"> <div class="header4"> Java </div> </td> </tr> </tbody> </table> </td> <td align="left"> <table cellpadding="0" class="categoryItem" cellspacing="0"> <tbody align="left"> <tr> <td align="left" style="vertical-align: top;"> <div class="header4"> C# </div> </td> </tr> </tbody> </table> </td> <td> ? </td> </tr> <tr> <td> ? </td> <td> ? </td> <td> ? </td> <td> ? </td> </tr> <tr> <td> ? </td> <td> ? </td> <td> ? </td> <td> ? </td> </tr> <tr> <td> ? </td> <td> ? </td> <td> ? </td> <td> ? </td> </tr> <tr> <td> ? </td> <td> ? </td> <td> ? </td> <td> ? </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </div>