How to get tbody from table from beautiful python soup?

Question

How to get tbody from table from beautiful python soup?

I am trying to cancel Year and Winners (first and second columns) from the table “List of final matches” (second table) from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals : I use the following code:

import urllib2 from BeautifulSoup import BeautifulSoup url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm" soup = BeautifulSoup(urllib2.urlopen(url).read()) soup.findAll('table')[0].tbody.findAll('tr') for row in soup.findAll('table')[0].tbody.findAll('tr'): first_column = row.findAll('th')[0].contents third_column = row.findAll('td')[2].contents print first_column, third_column

With the above code, I was able to get the first and sober column just fine. But when I use the same code from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals , he could not find the tbody as his element, but I can see the corpse when I check the element.

 url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals" soup = BeautifulSoup(urllib2.urlopen(url).read()) print soup.findAll('table')[2] soup.findAll('table')[2].tbody.findAll('tr') for row in soup.findAll('table')[0].tbody.findAll('tr'): first_column = row.findAll('th')[0].contents third_column = row.findAll('td')[2].contents print first_column, third_column

Here is what I got from the comment error:

 ' --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-150-fedd08c6da16> in <module>() 7 # print soup.findAll('table')[2] 8 ----> 9 soup.findAll('table')[2].tbody.findAll('tr') 10 for row in soup.findAll('table')[0].tbody.findAll('tr'): 11 first_column = row.findAll('th')[0].contents AttributeError: 'NoneType' object has no attribute 'findAll' '

+3

python web-scraping beautifulsoup

Jpc Dec 11 '13 at 15:15

source share

2 answers

Derek litz · Answer 1 · 2013-12-11T15:30:22+0000

If you are viewing the validation tool in a browser, it inserts tbody tags.

The source code may or may not contain them. I suggest looking at the original view if you really want to know.

In any case, you do not need to go to the body simply:

soup.findAll('table')[0].findAll('tr') should work.

GMPrazzoli · Answer 2 · 2013-12-11T15:32:49+0000

 url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals" soup = BeautifulSoup(urllib2.urlopen(url).read()) for tr in soup.findAll('table')[2].findAll('tr'): #get data

And then find what you need in the table :)

How to get tbody from table from beautiful python soup?

More articles: