Scrape google resultstats using python

I would like to get an estimated result number from Google for a keyword. I am using Python3.3 and trying to accomplish this task with BeautifulSoup and urllib.request. This is my simple code so far.

def numResults(): try: page_google = '''http://www.google.de/#output=search&sclient=psy-ab&q=pokerbonus&oq=pokerbonus&gs_l=hp.3..0i10l2j0i10i30l2.16503.18949.0.20819.10.9.0.1.1.0.413.2110.2-6j1j1.8.0....0...1c.1.19.psy-ab.FEBvxrgi0KU&pbx=1&bav=on.2,or.r_qf.&bvm=bv.48705608,d.Yms&''' req_google = Request(page_google) req_google.add_header('User Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20120427 Firefox/15.0a1') html_google = urlopen(req_google).read() soup = BeautifulSoup(html_google) scounttext = soup.find('div', id='resultStats') except URLError as e: print(e) return scounttext 

My problem is that my soup variable is somehow encoded and that I cannot get any information from it. Therefore, I return to None because soup.find does not work.

What am I doing wrong and how can I extract the desired results? Thank you very much!

+4
source share
1 answer

If you haven't solved this problem yet, it looks like the reason BeautifulSoup doesn't find anything because resultStats never appears in the soup is because your query (page_google) returns only JavaScript and not the search results that JavaScript dynamically loads. You can verify this by adding

 print(soup) 

for your code and you will see that the resultStats div is not showing.

The following code:

 import sys from urllib2 import Request, urlopen import urllib from bs4 import BeautifulSoup query = 'pokerbonus' url = "http://www.google.de/search?q=%s" % urllib.quote_plus(query) req_google = Request(url) req_google.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3') html_google = urlopen(req_google).read() soup = BeautifulSoup(html_google) scounttext = soup.find('div', id='resultStats') print(scounttext) 

Will be printed

 <div class="sd" id="resultStats">Ungefรคhr 1.060.000 Ergebnisse</div> 

Finally, using a tool like Selenium Webdriver may be the best way to resolve this issue, since Google does not allow bots to scratch the search results.

+3
source

Source: https://habr.com/ru/post/1489980/


All Articles