HTML Table Analysis with BS4

I tried different methods of cleaning data from this site ( http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=WR&college= ) and it seems that none of them can work. I tried to play with the specified indicators, but I can not get it to work. I think that at this moment I tried too many things, so if someone could point me in the right direction, I would really appreciate it.

I would like to pull out all the information and export it to a CSV file, but at this point I am just trying to get the name and print position to start.

Here is my code:

import urllib2
from bs4 import BeautifulSoup
import re

url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=')

page = urllib2.urlopen(url).read()

soup = BeautifulSoup(page)
table = soup.find('table')

for row in table.findAll('tr')[0:]:
    col = row.findAll('tr')
    name = col[1].string
    position = col[3].string
    player = (name, position)
    print "|".join(player)

Here is the error I get: line 14, in the name = col [1] .string IndexError: index index is out of range.

- UPDATE -

, . , , . ? :

import urllib2
from bs4 import BeautifulSoup
import re

url = ('http://nflcombineresults.com/nflcombinedata.php?year=1999&pos=&college=')

page = urllib2.urlopen(url).read()

soup = BeautifulSoup(page)
table = soup.find('table')


for row in table.findAll('tr')[1:250]:
    col = row.findAll('td')
    name = col[1].getText()
    position = col[3].getText()
    player = (name, position)
    print "|".join(player)
+4
2

8 . - . ! csv. ....

:

import urllib2
from bs4 import BeautifulSoup
import csv

url = ('http://nflcombineresults.com/nflcombinedata.php?year=2000&pos=&college=')

page = urllib2.urlopen(url).read()

soup = BeautifulSoup(page)
table = soup.find('table')

f = csv.writer(open("2000scrape.csv", "w"))
f.writerow(["Name", "Position", "Height", "Weight", "40-yd", "Bench", "Vertical", "Broad", "Shuttle", "3-Cone"])
# variable to check length of rows
x = (len(table.findAll('tr')) - 1)
# set to run through x
for row in table.findAll('tr')[1:x]:
    col = row.findAll('td')
    name = col[1].getText()
    position = col[3].getText()
    height = col[4].getText()
    weight = col[5].getText()
    forty = col[7].getText()
    bench = col[8].getText()
    vertical = col[9].getText()
    broad = col[10].getText()
    shuttle = col[11].getText()
    threecone = col[12].getText()
    player = (name, position, height, weight, forty, bench, vertical, broad, shuttle, threecone, )
    f.writerow(player)
+6

script - , , :

col = row.findAll('tr')

row tr, BeautifulSoup - . , :

col = row.findAll('td')

, tds, div a s, getText .string:

name = col[1].getText()
position = col[3].getText()
+1

Source: https://habr.com/ru/post/1529281/


All Articles