This is the nasty piece of HTML you have. If we ignore the semantics of table rows and table cells for a moment and consider it as pure XML, its structure is as follows:
<tr> <td>1 <td> <td>20 <td>5%</td> </td> </td> </td> </tr>
BeautifulSoup, however, knows about the semantics of HTML tables and instead analyzes it as follows:
<tr> <td>1 <td> <td>20 <td>5%</td> </td> </td> </td> </tr>
... so, as you say, 1 and 20 are in the first and third td elements ( not tags ) respectively.
In fact, you can get the contents of these td elements as follows:
>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup("<tr><td>1<td><td>20<td>5%</td></td></td></td></tr>") >>> tr = soup.find("tr") >>> tr <tr><td>1</td><td></td><td>20</td><td>5%</td></tr> >>> td_list = tr.find_all("td") >>> td_list [<td>1</td>, <td></td>, <td>20</td>, <td>5%</td>] >>> td_list[0] # Python starts counting list items from 0, not 1 <td>1</td> >>> td_list[0].text '1' >>> td_list[2].text '20' >>> td_list[3].text '5%'
source share