Extract node text inside a tag that has a child in beautifulsoup4

Question

Extract node text inside a tag that has a child in beautifulsoup4

The HTML I process and clear has the following code:

<li> <span> 929</span> Serve Returned </li>

How can I extract only node text from <li> , "serve return" in this case with Beautifulsoup ?

.string does not work because <li> has a child, and .text returns text inside the <span> .

+6

python web-scraping beautifulsoup

user3562812 Apr 22 '15 at 20:21

source share

2 answers

Totem · Answer 1 · 2015-04-22T20:31:37+0000

I used the str.replace method for this:

 >>> li = soup.find('li') # or however you need to drill down to the <li> tag >>> mytext = li.text.replace(li.find('span').text, "") >>> print mytext Serve Returned

Hooked · Answer 2 · 2015-04-22T20:34:42+0000

 import bs4 html = r"<li> <span> 929</span> Serve Returned </li>" soup = bs4.BeautifulSoup(html) print soup.li.findAll(text=True, recursive=False)

This gives:

 [u' ', u' Serve Returned ']

The first element is the “text” that you have before the passage. This method can help you find text before and after (and between) any child elements.

Extract node text inside a tag that has a child in beautifulsoup4

More articles: