Using python queries and nice soup to pull text

Question

Using python queries and nice soup to pull text

thanks for watching my problem. I would like to know if there is a way to pull the data file from this text ... here is the URL of the page https://e-com.secure.force.com/adidasUSContact/

<div class="g-recaptcha" data-sitekey="6LfI8hoTAAAAAMax5_MTl3N-5bDxVNdQ6Gx6BcKX" data-type="image" id="ncaptchaRecaptchaId"><div style="width: 304px; height: 78px;"><div><iframe src="https://www.google.com/recaptcha/api2/anchor?k=6LfI8hoTAAAAAMax5_MTl3N-5bDxVNdQ6Gx6BcKX&amp;co=aHR0cHM6Ly9lLWNvbS5zZWN1cmUuZm9yY2UuY29tOjQ0Mw..&amp;hl=en&amp;type=image&amp;v=r20160921114513&amp;size=normal&amp;cb=ei2ddcb6rl03" title="recaptcha widget" width="304" height="78" role="presentation" frameborder="0" scrolling="no" name="undefined"></iframe></div><textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;  display: none; "></t

here is my current code

    import requests 
from bs4 import BeautifulSoup

headers = {
    'Host' : 'e-com.secure.force.com',
    'Connection' : 'keep-alive',
    'Upgrade-Insecure-Requests' : '1',
    'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64)',
    'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding' : 'gzip, deflate, sdch',
    'Accept-Language' : 'en-US,en;q=0.8'
}
url = 'https://e-com.secure.force.com/adidasUSContact/'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r, 'html.parser')
c = soup.find_all('div', attrs={"class": "data-sitekey"})
print c

+4

python python-requests beautifulsoup bs4

Tony sanchez Sep 28 '16 at 21:16

source share

1 answer

Padraic cunningham · Answer 1 · 2016-09-28T21:21:41+0000

Now we have the code, it is simple as:

import requests
from bs4 import BeautifulSoup


soup = BeautifulSoup(requests.get("https://e-com.secure.force.com/adidasUSContact/").content, "html.parser")

key = soup.select_one("#ncaptchaRecaptchaId")["data-sitekey"]

data-sitekey - an attribute not class css, so you just need to remove it from the elements, you can find an item by its ID, as described above.

You can also use the class name:

# css selector
key = soup.select_one("div.g-recaptcha")["data-sitekey"]
# regular find using class name
key = soup.find("div",class_="g-recaptcha")["data-sitekey"]

Using python queries and nice soup to pull text

More articles: