Find all tags of a specific class only after a tag with specific text

Question

Find all tags of a specific class only after a tag with specific text

I have a large long table in HTML, so the tags are not nested. It looks like this:

<tr>
    <td>A</td>
</tr>
<tr>
    <td class="x">...</td>
    <td class="x">...</td>
    <td class="x">...</td>
    <td class="x">...</td>
</tr>
<tr>
    <td class ="y">...</td>
    <td class ="y">...</td>
    <td class ="y">...</td>
    <td class ="y">...</td>
</tr>
<tr>
    <td>B</td>
</tr>
<tr>
    <td class="x">...</td>
    <td class="x">...</td>
    <td class="x">...</td>
    <td class="x">...</td>
</tr>
<tr>
    <td class ="y">I want this</td>
    <td class ="y">and this</td>
    <td class ="y">and this</td>
    <td class ="y">and this</td>
</tr>

So first I want to find a tree to find "B". Then I want to capture the text of each td tag with class y after B, but before the next row of the table starts with "C".

I tried this:

results = soup.find_all('td')
for result in results:
    if result.string == "B":
        print(result.string)

This gives me the string B that I want. But now I am trying to find everything after this, and I am not getting what I want.

for results in soup.find_all('td'):
    if results.string == 'B':
        a = results.find_next('td',class_='y')

This gives me the next td after "B", and that is what I want, but I can only get this first td tag. I want to capture all the tags that have class y, after "B", but before "C" (C does not appear in html, but follows the same pattern), and I want it in the list.

My summary list:

[['I want this'],['and this'],['and this'],['and this']]

+4

python html html-parsing beautifulsoup

strahanstoothgap 02 . '15 0:43

1

alecxe · Accepted Answer · 2015-10-02T01:36:30+0000

, , B. .

tr , find_next_siblings():

start = soup.find("td", text="B").parent
for tr in start.find_next_siblings("tr"):
    # exit if reached C
    if tr.find("td", text="C"):
        break

    # get all tds with a desired class
    tds = tr.find_all("td", class_="y")
    for td in tds:
        print(td.get_text())

, :

I want this
and this
and this
and this

Find all tags of a specific class only after a tag with specific text

More articles: