Extract node text inside a tag that has a child in beautifulsoup4

The HTML I process and clear has the following code:

<li> <span> 929</span> Serve Returned </li> 

How can I extract only node text from <li> , "serve return" in this case with Beautifulsoup ?

.string does not work because <li> has a child, and .text returns text inside the <span> .

+6
source share
2 answers

I used the str.replace method for this:

 >>> li = soup.find('li') # or however you need to drill down to the <li> tag >>> mytext = li.text.replace(li.find('span').text, "") >>> print mytext Serve Returned 
+1
source
 import bs4 html = r"<li> <span> 929</span> Serve Returned </li>" soup = bs4.BeautifulSoup(html) print soup.li.findAll(text=True, recursive=False) 

This gives:

 [u' ', u' Serve Returned '] 

The first element is the β€œtext” that you have before the passage. This method can help you find text before and after (and between) any child elements.

+1
source

Source: https://habr.com/ru/post/985714/


All Articles