Get the href Attribute Link link from the td BeautifulSoup Python tag

Question

Get the href Attribute Link link from the td BeautifulSoup Python tag

I'm new to Python, and someone suggested I use Beautiful soup for Scrapping, and I got into a problem to extract the href attribute from the td Column 2 tag based on the year in column 4.

<table class="tableFile2" summary="Results"> <tr> <th width="7%" scope="col">Filings</th> <th width="10%" scope="col">Format</th> <th scope="col">Description</th> <th width="10%" scope="col">Filing Date</th> <th width="15%" scope="col">File/Film Number</th> </tr> <tr> <td nowrap="nowrap">8-K</td> <td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513199324/0001193125-13-199324-index.htm" id="documentsbutton">&nbsp;Documents</a></td> <td class="small" >Current report, items 8.01 and 9.01 <br />Acc-no: 0001193125</td> <td>2013-05-03</td> <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=000-10030&amp;owner=include&amp;count=40">000-10030</a><br>13813281 </td> </tr> <tr class="blueRow"> <td nowrap="nowrap">424B2</td> <td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513191849/0001193125-13-191849-index.htm" id="documentsbutton">&nbsp;Documents</a></td> <td class="small" >Prospectus [Rule 424(b)(2)]<br />Acc-no: 0001193125</td> <td>2013-05-01</td> <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-188191&amp;owner=include&amp;count=40">333-188191</a><br>13802405 </td> </tr> <tr> <td nowrap="nowrap">FWP</td> <td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513189053/0001193125-13-189053-index.htm" id="documentsbutton">&nbsp;Documents</a></td> <td class="small" >Filing under Securities Act Rules 163/433 of free writing prospectuses<br />Acc-no: 0001193125-13-189053&nbsp;(34 Act)&nbsp; Size: 52 KB </td> <td>2013-05-01</td> <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-188191&amp;owner=include&amp;count=40">333-188191</a><br>13800170 </td> </tr> </table> table = soup.find('table', class="tableFile2") rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') if "2013" in cols[3] link = cols[1].find('a').get('href') print

+4

python beautifulsoup

Zaid iqbal May 24 '13 at 10:40

source share

1 answer

Charles Marsh · Accepted Answer · 2013-06-05T21:37:04+0000

This works for me in Python 2.7:

 table = soup.find('table', {'class': 'tableFile2'}) rows = table.findAll('tr') for tr in rows: cols = tr.findAll('td') if len(cols) >= 4 and "2013" in cols[3].text: link = cols[1].find('a').get('href') print link

A few problems with your previous code:

soup.find() requires an attribute dictionary (e.g. {'class' : 'tableFile2'} )
Not every cols instance will have at least 3 columns, so you need to check the length first.

Get the href Attribute Link link from the td BeautifulSoup Python tag

More articles: