I am having this weird issue with HtmlUnit in Java. I use it to download some data from a website, the process looks something like this:
1 - Login
2 - for each element (cars)
----- 3 Car Search
----- 4 Download the zip file from the link
Code:
Web client creation:
webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient.setJavaScriptEnabled(true); webClient.setThrowExceptionOnScriptError(false); DefaultCredentialsProvider provider = new DefaultCredentialsProvider(); provider.addCredentials(USERNAME, PASSWORD); webClient.setCredentialsProvider(provider); webClient.setRefreshHandler(new ImmediateRefreshHandler());
Login:
public void login() throws IOException { page = (HtmlPage) webClient.getPage(URL); HtmlForm form = page.getFormByName("formLogin"); String user = USERNAME; String password = PASSWORD; // Enter login and password form.getInputByName("LoginSteps$UserName").setValueAttribute(user); form.getInputByName("LoginSteps$Password").setValueAttribute(password); // Click Login Button page = (HtmlPage) form.getInputByName("LoginSteps$LoginButton").click(); webClient.waitForBackgroundJavaScript(3000); // Click on Campa area HtmlAnchor link = (HtmlAnchor) page.getElementById("ctl00_linkCampaNoiH"); page = (HtmlPage) link.click(); webClient.waitForBackgroundJavaScript(3000); System.out.println(page.asText()); }
Search for a car on the site:
private void searchCar(String _regNumber) throws IOException { // Open search window page = page.getElementById("search_gridCampaNoi").click(); webClient.waitForBackgroundJavaScript(3000); // Write plate number HtmlInput element = (HtmlInput) page.getElementById("jqg1"); element.setValueAttribute(_regNumber); webClient.waitForBackgroundJavaScript(3000); // Click on search HtmlAnchor anchor = (HtmlAnchor) page.getByXPath("//*[@id=\"fbox_gridCampaNoi_search\"]").get(0); page = anchor.click(); webClient.waitForBackgroundJavaScript(3000); System.out.println(page.asText()); }
Download pdf:
try { InputStream is = _link.click().getWebResponse().getContentAsStream(); File path = new File(new File(DOWNLOAD_PATH), _regNumber); if (!path.exists()) { path.mkdir(); } writeToFile(is, new File(path, _regNumber + "_pdfs.zip")); } catch (Exception e) { e.printStackTrace(); } }
Problem:
The first car is working fine, pdf is loaded, but as soon as I search for a new car, when I get to this line:
page = page.getElementById("search_gridCampaNoi").click();
I get this exception:
Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage
After debugging, I realized that the moment I make this call:
InputStream is = _link.click().getWebResponse().getContentAsStream();
return type page.getElementById ("search_gridCampaNoi"). click () changes from HtmlPage to WebResponse, so instead of getting a new page, I again get the file that I already uploaded.
A few screenshots of the debugger showing this situation:
First call, return type OK:

The second call, the return type is changed, and I no longer get the HtmlPage:

Thanks in advance!