I have a large long table in HTML, so the tags are not nested. It looks like this:
<tr>
<td>A</td>
</tr>
<tr>
<td class="x">...</td>
<td class="x">...</td>
<td class="x">...</td>
<td class="x">...</td>
</tr>
<tr>
<td class ="y">...</td>
<td class ="y">...</td>
<td class ="y">...</td>
<td class ="y">...</td>
</tr>
<tr>
<td>B</td>
</tr>
<tr>
<td class="x">...</td>
<td class="x">...</td>
<td class="x">...</td>
<td class="x">...</td>
</tr>
<tr>
<td class ="y">I want this</td>
<td class ="y">and this</td>
<td class ="y">and this</td>
<td class ="y">and this</td>
</tr>
So first I want to find a tree to find "B". Then I want to capture the text of each td tag with class y after B, but before the next row of the table starts with "C".
I tried this:
results = soup.find_all('td')
for result in results:
if result.string == "B":
print(result.string)
This gives me the string B that I want. But now I am trying to find everything after this, and I am not getting what I want.
for results in soup.find_all('td'):
if results.string == 'B':
a = results.find_next('td',class_='y')
This gives me the next td after "B", and that is what I want, but I can only get this first td tag. I want to capture all the tags that have class y, after "B", but before "C" (C does not appear in html, but follows the same pattern), and I want it in the list.
My summary list:
[['I want this'],['and this'],['and this'],['and this']]