HtmlUnit getByXpath returns null

I code Groovy, but I don't think this is a language-specific set of questions.

I actually have two questions

First question

I am having a problem using HtmlUnit. This tells me that what I'm trying to capture is null.

The page I'm testing for: http://browse.deviantart.com/resources/applications/psbrushes/?order=9&offset=0#/dbwam4

My code is:

client = new WebClient(BrowserVersion.FIREFOX_3) client.javaScriptEnabled = false page = client.getPage(url) //coming up as null title = page.getByXPath("//html/body/div[4]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a") println title 

It just prints: []

Is this because onclick () is used on the page? If so, how do I get around this? Enabling javascript creates a mess in my cmd hint.

Second question

I also want to get the image, but I have problems, because when I try to get XPath (via firebug), it appears as: // * [@id = "gmi-ResViewSizer_img"]

How do I handle this?

+1
source share
2 answers

First answer:

 /html/body/div[3]/div/div[3]/div/div/div/div/div/div/div/div/div/div/h1/a 

Your XPATH disconnected one in the predicate filter for the 4th div body, it should be the third div . It looks like the HTML for the site may / may change when you initially infected XPATH with Firebug. You may need to configure XPATH to accommodate possible changes and be less sensitive to some differences in document structure.

Maybe something like this:

 /html/body//div/h1/a 

Second answer: the XPATH you specified will work. It may look odd / short (and may not be the most efficient), but // starts at the root of the node and is looked at in every node in the tree, * matches any element (to include img ), and the predicate filter [] limits those for which there is an id attribute whose value is "gmi-ResViewSizer_img".

There are many other XPATH options that may work. This will also depend on how often the HTML structure changes. This is the one that also works for the linked page to select img :

 /html/body/div/div/div/div/img[1] 
+1
source

I had the same problem, I decided when I understand the iframe tags on the page, try calling

 ((HtmlPage)current_page.getFrames()[n].getEnclosedPage()).getElementByXPath(... 

where n is the position in the frame in the iframe collection. It works for me!

Thank you very much.

0
source

Source: https://habr.com/ru/post/1334120/


All Articles