Get the minimum XPath element

Question

Get the minimum XPath element

I am trying to create a function that returns the XPATH of an element. Unfortunately, it returns an absolute xpath, which is not enough.

I want to get as little xpath as possible (or better - smarter, not necessarily minimal). For example, if an element has id, then the return of xpath depends on its identifier.

I want to use this xpath several times, and the absolute xpath is very vulnerable to page changes.

Or, if the parent has an id, then return the xpath to the parents by id and concat c /child.

Is this possible with a module lxmlor another module?

For example, an XPath helper extension can do this better.

def _load_root(url):
    r = requests.get(url)
    r.encoding = 'utf-8'
    html = r.content
    return etree.fromstring(html, etree.HTMLParser())

def get_xpath_by_text(text,url):
    root = _load_root(url)
    e = root.xpath('.//*[contains(text(),"{}")]'.format(text))
    print root.getpath(e)

/HTML//DIV [1]/ [1]/ [1]/ [2]/ [1]/ [1]/ [2]/ [2]/ [ 1]// [1]/ [2]/ [2]/ [2]/ [1]/ [1]// [6]/ [2]/ [1 ]

, ?

+4

python html python-2.7 xpath web-scraping

Milano Slesarik 29 . '16 20:09

1

Michael Kay · Answer 1 · 2016-12-29T22:46:04+0000

, : XPath XPath, .

XPath (//*)[134], .

XPath id(), , :

function minimalXpath(Node node) {
  if (exists(node/@id))
    then "id(" + node/@id + ")"
  else if (node is root)
    then ""
  else minimalXPath(node.getParent()) + "/" + node.getName() +
    "[" + node.getSiblingPosition() + "]"
}

Get the minimum XPath element

More articles: