How to get tbody from table from beautiful python soup?

I am trying to cancel Year and Winners (first and second columns) from the table “List of final matches” (second table) from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals : I use the following code:

import urllib2 from BeautifulSoup import BeautifulSoup url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm" soup = BeautifulSoup(urllib2.urlopen(url).read()) soup.findAll('table')[0].tbody.findAll('tr') for row in soup.findAll('table')[0].tbody.findAll('tr'): first_column = row.findAll('th')[0].contents third_column = row.findAll('td')[2].contents print first_column, third_column 

With the above code, I was able to get the first and sober column just fine. But when I use the same code from http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals , he could not find the tbody as his element, but I can see the corpse when I check the element.

 url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals" soup = BeautifulSoup(urllib2.urlopen(url).read()) print soup.findAll('table')[2] soup.findAll('table')[2].tbody.findAll('tr') for row in soup.findAll('table')[0].tbody.findAll('tr'): first_column = row.findAll('th')[0].contents third_column = row.findAll('td')[2].contents print first_column, third_column 

Here is what I got from the comment error:

 ' --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-150-fedd08c6da16> in <module>() 7 # print soup.findAll('table')[2] 8 ----> 9 soup.findAll('table')[2].tbody.findAll('tr') 10 for row in soup.findAll('table')[0].tbody.findAll('tr'): 11 first_column = row.findAll('th')[0].contents AttributeError: 'NoneType' object has no attribute 'findAll' ' 
+3
source share
2 answers

If you are viewing the validation tool in a browser, it inserts tbody tags.

The source code may or may not contain them. I suggest looking at the original view if you really want to know.

In any case, you do not need to go to the body simply:

soup.findAll('table')[0].findAll('tr') should work.

+4
source
 url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals" soup = BeautifulSoup(urllib2.urlopen(url).read()) for tr in soup.findAll('table')[2].findAll('tr'): #get data 

And then find what you need in the table :)

0
source

Source: https://habr.com/ru/post/1244528/


All Articles