I am trying to create a function that returns the XPATH of an element. Unfortunately, it returns an absolute xpath, which is not enough.
I want to get as little xpath as possible (or better - smarter, not necessarily minimal). For example, if an element has id, then the return of xpath depends on its identifier.
I want to use this xpath several times, and the absolute xpath is very vulnerable to page changes.
Or, if the parent has an id, then return the xpath to the parents by id and concat c /child.
Is this possible with a module lxmlor another module?
For example, an XPath helper extension can do this better.
def _load_root(url):
r = requests.get(url)
r.encoding = 'utf-8'
html = r.content
return etree.fromstring(html, etree.HTMLParser())
def get_xpath_by_text(text,url):
root = _load_root(url)
e = root.xpath('.//*[contains(text(),"{}")]'.format(text))
print root.getpath(e)
/HTML//DIV [1]/ [1]/ [1]/ [2]/ [1]/ [1]/ [2]/ [2]/ [ 1]// [1]/ [2]/ [2]/ [2]/ [1]/ [1]// [6]/ [2]/ [1 ]
, ?