Python Web Scraping (Beautiful Soup, Selenium and PhantomJS): only clear part of the full page

Hello, I am having trouble trying to clear data from a website for modeling purposes (fantsylabs dotcom). I just hacked so forgive my ignorance in comp sci lingo. What I'm trying to accomplish is ...

  • Use selenium to enter the website and go to the data page.

    ## Initialize and load the web page
    url = "website url"
    driver = webdriver.Firefox()
    driver.get(url)
    time.sleep(3)
    
    ## Fill out forms and login to site
    username = driver.find_element_by_name('input')
    password = driver.find_element_by_name('password')
    username.send_keys('username')
    password.send_keys('password')
    login_attempt = driver.find_element_by_class_name("pull-right")
    login_attempt.click()
    
    ## Find and open the page with the data that I wish to scrape
    link = driver.find_element_by_partial_link_text('Player Models')
    link.click()
    time.sleep(10)
    
    ##UPDATED CODE TO TRY AND SCROLL DOWN TO LOAD ALL THE DYNAMIC DATA
    scroll = driver.find_element_by_class_name("ag-body-viewport")
    driver.execute_script("arguments[0].scrollIntoView();", scroll)
    
    ## Try to allow time for the full page to load the lazy way then pass to BeautifulSoup
    time.sleep(10)
    html2 = driver.page_source
    
    soup = BeautifulSoup(html2, "lxml", from_encoding="utf-8")
    div = soup.find_all('div', {'class':'ag-pinned-cols-container'})
    ## continue to scrape what I want
    

, , , (30 ), beautifulsoup. 300 , .... bs4 30 300. , javascript , html, bs4? ( Python request.get HTML-, )

, , -, phantomJS , , , "" , ?

    driver = webdriver.PhantomJS() ##instead of webdriver.Firefox()

, Ive / javascript, , .


:

( ). . .

enter image description here

, , , . Firefox(), , , . , .

/.

+4
1

, .

, lxml HTML, :

soup = BeautifulSoup(html2, "html.parser")
soup = BeautifulSoup(html2, "html5lib")

, BeautifulSoup . selenium . , :

for div in driver.find_elements_by_css_selector(".ag-pinned-cols-container'"):
    # do smth with 'div'

, . , , , . :

+2

Source: https://habr.com/ru/post/1624135/


All Articles