BeautifulSoup - How to get all the text between two different tags?

Question

BeautifulSoup - How to get all the text between two different tags?

I would like to get all the text between two tags:

<div class="lead">I DONT WANT this</div> #many different tags - p, table, h2 including text that I want <div class="image">...</div>

I started this way:

 url = "http://......." req = urllib.request.Request(url) source = urllib.request.urlopen(req) soup = BeautifulSoup(source, 'lxml') start = soup.find('div', {'class': 'lead'}) end = soup.find('div', {'class': 'image'})

And I have no idea what to do next

+5

python beautifulsoup

Alek SZ Jul 27 '17 at 9:39

source share

2 answers

herokingsley · Answer 1 · 2017-07-27T10:22:23+0000

try using the following code:

 from bs4 import BeautifulSoup soup = BeautifulSoup(""" <html><div class="lead">lead</div>data<div class="end"></div></html>" """, "lxml") node = soup.find('div', {'class': 'lead'}) s = [] while True: if node is None: break node = node.next_sibling if hasattr(node, "attrs") and ("end" in node.attrs['class'] ): break else: if node is not None: s.append(node) print s

using next_sibling to get brother node.

matsbauer · Answer 2 · 2017-07-27T10:35:00+0000

Try this code, it allows the initial starter to start the class and exit the program when the class image hits and print all available tags, this can be changed to print the entire code:

 html = u"" for tag in soup.find("div", { "class" : "lead" }).next_siblings: if soup.find("div", { "class" : "image" }) == True: exit() else: html += unicode(tag) print html

BeautifulSoup - How to get all the text between two different tags?

More articles: