Use an already open webpage (with selenium) for beautifulsoup?

Question

Use an already open webpage (with selenium) for beautifulsoup?

I have a webpage open and registered using webdriver code. Using webdriver for this, because the page requires a login and various other actions before I tune in to clean up.

The goal is to clear the data from this open page. You need to find the links and open them, so there will be many combinations between selenium webdriver and BeautifulSoup.

I looked through the documentation for bs4 and it BeautifulSoup(open("ccc.html"))gives an error

soup = bs4.BeautifulSoup(open("https://m/search.mp?ss=Pr+Dn+Ts"))

OSError: [Errno 22] Invalid argument: ' https: //m/search.mp? Ss = Pr + Dn + Ts '

I guess this is because its not a .html?

+4

python selenium beautifulsoup

Sid Jan 23 '17 at 17:13

source share

1 answer

alecxe · Accepted Answer · 2017-01-23T17:17:38+0000

You are trying to open a page at a web address. open()will not do this, use urlopen():

from urllib.request import urlopen  # Python 3
# from urllib2 import urlopen  # Python 2

url = "your target url here"
soup = bs4.BeautifulSoup(urlopen(url), "html.parser")

Or, use HTTP for people - requestslibrary :

import requests

response = requests.get(url)
soup = bs4.BeautifulSoup(response.content, "html.parser")

Also note that it is strongly recommended that you explicitly specify the parser - I used html.parserin this case, there are other parsers available.

I want to use the same page (same instance)

The usual way to do this is to receive driver.page_sourceand pass it on BeautifulSoupfor further analysis:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get(url)

# wait for page to load..

source = driver.page_source
driver.quit()  # remove this line to leave the browser open

soup = BeautifulSoup(source, "html.parser")

Use an already open webpage (with selenium) for beautifulsoup?

More articles: