Loading a scraper table through BS4 in a Pandas Dataframe

I am trying to capture any of the Basic Box Score Stat or Advanced Box Score statistics tables from here

I tried to do something like this:

url = "http://www.basketball-reference.com/boxscores/200112100LAC.html"
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
soup = BeautifulSoup(page.content, "html5lib")

table =  soup.find('div', class_='overthrow table_container').find('table',class_='sortable stats_table')
df = pd.read_html(table)
print df

However, it does not work due to a NoneType object error. Is there a better way to make table code and put it in a framework? Thank.

+4
source share
2 answers

You can use read_htmlthat return a list DataFrameof all parsed tables:

df = pd.read_html('http://www.basketball-reference.com/boxscores/200112100LAC.html')[0] # or [1], [2]
print (df)
+7
source

table is the tag object in BeautifulSoup, you must convert it to a string and pass it to pandas

prettify() Beautiful Soup Unicode, HTML/XML :

df = pd.read_html(table.prettify())
+2

Source: https://habr.com/ru/post/1663507/


All Articles