Can wonderful soup also go to web pages?

Beautiful Soup is a Python library for pulling data from HTML and XML files. I will use it to retrieve web page data, but I have not found a way to click the anchor label buttons, which are used in my case to navigate the pages. Therefore, for this I must use any other, or beautiful soup has an opportunity that I did not know about.

Please advise me!

+1
source share
1 answer

To answer your tags / comments, yes, you can use them together (Selenium and BeautifulSoup), and no, you cannot directly use BeautifulSoup to execute events (click, etc.). Although I myself never used them together in the same situation, a hypothetical situation could include using Selenium to go to the landing page along a specific path (i.e. click() these options, and then click() button on the next page) and then using BeautifulSoup to read driver.page_source (where driver is the Selenium driver that you created for the β€œdrive” in the browser). Since driver.page_source is the HTML page of the page, you can use BeautifulSoup as you used to, playing out any necessary information.

A simple example:

 from bs4 import BeautifulSoup from selenium import webdriver # Create your driver driver = webdriver.Firefox() # Get a page driver.get('http://news.ycombinator.com') # Feed the source to BeautifulSoup soup = BeautifulSoup(driver.page_source) print soup.title # <title>Hacker News</title> 

The basic idea is that anytime you need to read the page source, you can pass driver.page_source to BeautifulSoup to read whatever you want.

+1
source

Source: https://habr.com/ru/post/1270757/


All Articles