Retrieve the original position of a string from a beautifulsoup element

When analyzing long complex html documents with beautifulsoup, it is sometimes useful to get the exact position in the source line where I correspond to the element. I cannot just search for a string, as there may be several matching elements, and I would lose the ability of bs4 to parse the DOM. Given this minimal working example:

import bs4

html = "<div><b>Hello</b>  <i>World</i></div>"
soup = bs4.BeautifulSoup(html,'lxml')

# Returns 22
print html.find("World")

# How to get this to return 22?
print soup.find("i", text="World")

How can I get the item extracted bs4to return 22?

+4
source share

Source: https://habr.com/ru/post/1692106/


All Articles