When analyzing long complex html documents with beautifulsoup, it is sometimes useful to get the exact position in the source line where I correspond to the element. I cannot just search for a string, as there may be several matching elements, and I would lose the ability of bs4 to parse the DOM. Given this minimal working example:
import bs4
html = "<div><b>Hello</b> <i>World</i></div>"
soup = bs4.BeautifulSoup(html,'lxml')
print html.find("World")
print soup.find("i", text="World")
How can I get the item extracted bs4
to return 22?
source
share