Get the href Attribute Link link from the td BeautifulSoup Python tag

I'm new to Python, and someone suggested I use Beautiful soup for Scrapping, and I got into a problem to extract the href attribute from the td Column 2 tag based on the year in column 4.

<table class="tableFile2" summary="Results"> <tr> <th width="7%" scope="col">Filings</th> <th width="10%" scope="col">Format</th> <th scope="col">Description</th> <th width="10%" scope="col">Filing Date</th> <th width="15%" scope="col">File/Film Number</th> </tr> <tr> <td nowrap="nowrap">8-K</td> <td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513199324/0001193125-13-199324-index.htm" id="documentsbutton">&nbsp;Documents</a></td> <td class="small" >Current report, items 8.01 and 9.01 <br />Acc-no: 0001193125</td> <td>2013-05-03</td> <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=000-10030&amp;owner=include&amp;count=40">000-10030</a><br>13813281 </td> </tr> <tr class="blueRow"> <td nowrap="nowrap">424B2</td> <td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513191849/0001193125-13-191849-index.htm" id="documentsbutton">&nbsp;Documents</a></td> <td class="small" >Prospectus [Rule 424(b)(2)]<br />Acc-no: 0001193125</td> <td>2013-05-01</td> <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-188191&amp;owner=include&amp;count=40">333-188191</a><br>13802405 </td> </tr> <tr> <td nowrap="nowrap">FWP</td> <td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513189053/0001193125-13-189053-index.htm" id="documentsbutton">&nbsp;Documents</a></td> <td class="small" >Filing under Securities Act Rules 163/433 of free writing prospectuses<br />Acc-no: 0001193125-13-189053&nbsp;(34 Act)&nbsp; Size: 52 KB </td> <td>2013-05-01</td> <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-188191&amp;owner=include&amp;count=40">333-188191</a><br>13800170 </td> </tr> </table> table = soup.find('table', class="tableFile2") rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') if "2013" in cols[3] link = cols[1].find('a').get('href') print 
+4
source share
1 answer

This works for me in Python 2.7:

 table = soup.find('table', {'class': 'tableFile2'}) rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') if len(cols) >= 4 and "2013" in cols[3].text: link = cols[1].find('a').get('href') print link 

A few problems with your previous code:

  • soup.find() requires an attribute dictionary (e.g. {'class' : 'tableFile2'} )
  • Not every cols instance will have at least 3 columns, so you need to check the length first.
+18
source

Source: https://habr.com/ru/post/1482540/


All Articles