Using python with selenium to clean dynamic web pages

There are several links on the site at the top with tags 1 , 2 , 3 and next . If a link marked with a number is clicked, it dynamically loads some data into the content div . If you click next , it will go to the page with labels 4 , 5 , 6 , next and the data for the page will be displayed.

I want to clear the data from the contents of a div for all clicked links (I don't know how many there are, it just shows 3 at a time and next )

Please provide an example of how to do this. For example, consider www.cnet.com.

I ask you to advise to download a series of pages using selenium and analyze them to cope with a beautiful soup yourself.

+4
source share
1 answer

General layout (not verified):

 #!/usr/bin/env python from contextlib import closing from selenium.webdriver import Firefox # pip install selenium url = "http://example.com" # use firefox to get page with javascript generated content with closing(Firefox()) as browser: n = 1 while n < 10: browser.get(url) # load page link = browser.find_element_by_link_text(str(n)) while link: browser.get(link.get_attribute("href")) # get individual 1,2,3,4 pages #### save(browser.page_source) browser.back() # return to page that has 1,2,3,next -like links n += 1 link = browser.find_element_by_link_text(str(n)) link = browser.find_element_by_link_text("next") if not link: break url = link.get_attribute("href") 
+10
source

Source: https://habr.com/ru/post/1388227/


All Articles