How to use BeautifulSoup to search for a list of tags, with one element in the list having an attribute?

Question

How to use BeautifulSoup to search for a list of tags, with one element in the list having an attribute?

Does anyone know how to use bs4 in python to search for multiple tags, one of which will need an attribute?

For example, to search for all occurrences of a single tag with an attribute, I know that I can do this:

tr_list = soup_object.find_all('tr', id=True)

And I know that I can do this too:

tag_list = soup_object.find_all(['a', 'b', 'p', 'li'])

But I can’t understand how to combine the two statements, which theoretically give me a list in the order in which all these html tags appear, with each tr tag having an identifier.

The html snippet will look something like this:

  <tr id="uniqueID">
   <td nowrap="" valign="baseline" width="8%">
    <b>
     A_time_as_text
    </b>
   </td>
   <td class="storyTitle">
    <a href="a_link.com" target="_new">
     some_text
    </a>
    <b>
     a_headline_as_text
    </b>
    a_number_as_text
   </td>
  </tr>
  <tr>
   <td>
    <br/>
   </td>
   <td class="st-Art">
    <ul>
     <li>
      more_text_text_text
      <strong>
       more_text_text_text
       <font color="228822">
        more_text_text_text
       </font>
      </strong>
      more_text_text_text
     </li>
     <li>
      more_text_text_text
      <ul>
       <li>
        more_text_text_text
       </li>
      </ul>
     </li>
    </ul>
   </td>
  </tr>
  <tr>
  </tr>

Thanks for the help!

+4

python html web scraping beautifulsoup

Sammy doodle Jan 31 '18 at 3:53

source share

1 answer

Martin Evans · Accepted Answer · 2018-01-31T12:34:20+0000

tr , id :

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

for tag in soup.find_all(['a', 'b', 'p', 'li', 'tr']):
    if tag.name != 'tr' or (tag.name == 'tr' and tag.get('id')):
        print tag.name

html :

tr
b
a
b
li
li
li

: a b p li, tr id, :

for tr in soup.find_all('tr', id=True):
    for tag in tr.find_all(['a', 'b', 'p', 'li']):
        print tag.name, tag.get_text(strip=True)

:

b A_time_as_text
a some_text
b a_headline_as_text

How to use BeautifulSoup to search for a list of tags, with one element in the list having an attribute?

More articles: