How to get innerHTML of whole page in selenium driver?

I use selenium to go to the desired web page and then analyze it with Beautiful Soup .

Someone showed how to get an internal HTML element in Selenium WebDriver . Is there a way to get the HTML of the whole page? thanks

Python code example (Judging by the post above, the language does not seem to matter much):

 from selenium import webdriver from selenium.webdriver.support.ui import Select from bs4 import BeautifulSoup url = 'http://www.google.com' driver = webdriver.Firefox() driver.get(url) the_html = driver---somehow----.get_attribute('innerHTML') bs = BeautifulSoup(the_html, 'html.parser') 
+10
source share
2 answers

To get HTML for the whole page:

 from selenium import webdriver driver = webdriver.Firefox() driver.get("http://stackoverflow.com") html = driver.page_source 

To get external HTML (including tag):

 # HTML from `<html>` html = driver.execute_script("return document.documentElement.outerHTML;") # HTML from `<body>` html = driver.execute_script("return document.body.outerHTML;") # HTML from element with some JavaScript element = driver.find_element_by_css_selector("#hireme") html = driver.execute_script("return arguments[0].outerHTML;", element) # HTML from element with `get_attribute` element = driver.find_element_by_css_selector("#hireme") html = element.get_attribute('outerHTML') 

To get the internal HTML (tag excluded):

 # HTML from `<html>` html = driver.execute_script("return document.documentElement.innerHTML;") # HTML from `<body>` html = driver.execute_script("return document.body.innerHTML;") # HTML from element with some JavaScript element = driver.find_element_by_css_selector("#hireme") html = driver.execute_script("return arguments[0].innerHTML;", element) # HTML from element with `get_attribute` element = driver.find_element_by_css_selector("#hireme") html = element.get_attribute('innerHTML') 
+17
source

Using the page object:

 @FindBy(xpath = "xapth") private WebElement element; public String getInnnerHtml() { System.out.println(waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML")); return waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML") } 
0
source

Source: https://habr.com/ru/post/1274269/


All Articles