Loading a scraper table through BS4 in a Pandas Dataframe

Question

Loading a scraper table through BS4 in a Pandas Dataframe

I am trying to capture any of the Basic Box Score Stat or Advanced Box Score statistics tables from here

I tried to do something like this:

url = "http://www.basketball-reference.com/boxscores/200112100LAC.html"
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
soup = BeautifulSoup(page.content, "html5lib")

table =  soup.find('div', class_='overthrow table_container').find('table',class_='sortable stats_table')
df = pd.read_html(table)
print df

However, it does not work due to a NoneType object error. Is there a better way to make table code and put it in a framework? Thank.

+4

python pandas beautifulsoup

Ravash jalil Dec 12 '16 at 12:01

source share

2 answers

table is the tag object in BeautifulSoup, you must convert it to a string and pass it to pandas

prettify() Beautiful Soup Unicode, HTML/XML :

df = pd.read_html(table.prettify())

+2

宏杰李 12 . '16 12:06

jezrael · Accepted Answer · 2016-12-12T12:02:48+0000

You can use read_htmlthat return a list DataFrameof all parsed tables:

df = pd.read_html('http://www.basketball-reference.com/boxscores/200112100LAC.html')[0] # or [1], [2]
print (df)

Loading a scraper table through BS4 in a Pandas Dataframe

More articles: