I am trying to crawl the site " http://everydayhealth.com ". However, I found that the page will be dynamically displayed. Therefore, when I click the "Advanced" button, new news will be shown. However, using a shard to click a button prevents browser.html from automatically changing the current html content. Is there a way to let him get the latest html source using either a shard or selenium? My fragment code looks like this:
import requests from bs4 import BeautifulSoup from splinter import Browser browser = Browser() browser.visit('http://everydayhealth.com') browser.click_link_by_text("More") print(browser.html)
Based on @Louis answer, I rewrote the program as follows:
from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait driver = webdriver.Firefox() driver.get("http://www.everydayhealth.com") more_xpath = '//a[@class="btn-more"]' more_btn = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_xpath(more_xpath)) more_btn.click() more_news_xpath = '(//a[@href="http://www.everydayhealth.com/recipe-rehab/5-herbs-and-spices-to-intensify-flavor.aspx"])[2]' WebDriverWait(driver, 5).until(lambda driver: driver.find_element_by_xpath(more_news_xpath)) print(driver.execute_script("return document.documentElement.outerHTML;")) driver.quit()
However, in the output text, I still could not find the text on the updated page. For example, when I search for βThe milk of your friend or enemy?β, He still returns nothing. What is the problem?
source share