Removing span tags from BeautifulSoup / Python soup

I have soup in Python like this:

<p>
 <span style="text-decoration: underline; color: #3366ff;">
   Title:
 </span>
 Info
</p>
<p>
 <span style="color: #3366ff;">
  <span style="text-decoration: underline;">
   Title2:
  </span>
 </span>
 Info2
</p>

I want it to look like this:

<p>
   Title:
 Info
</p>
<p>
   Title2:
 Info2
</p>

Is there any way to do this with bs4?

+6
source share
2 answers

For this you need to use beautifulsoup expand () .

import bs4
soup1 = bs4.BeautifulSoup(htm1, 'html.parser')
for match in soup1.findAll('span'):
    match.unwrap()
print soup1
+13
source

You can also use replace_withspan tags to remove tags:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for span_tag in soup.findAll('span'):
    span_tag.replace_with('')
print(soup)
+2
source

Source: https://habr.com/ru/post/1532410/


All Articles