Python: The correct URL to download images from Google Image Search

I am trying to get images from a Google Image search for a specific request. But the page that I load without photos redirects me to the original version of Google. Here is my code:

AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1" GOOGLE_URL = "https://www.google.com/images?source=hp&q={0}" _myGooglePage = "" def scrape(self, theQuery) : self._myGooglePage = subprocess.check_output(["curl", "-L", "-A", self.AGENT_ID, self.GOOGLE_URL.format(urllib.quote(theQuery))], stderr=subprocess.STDOUT) print self.GOOGLE_URL.format(urllib.quote(theQuery)) print self._myGooglePage f = open('./../../googleimages.html', 'w') f.write(self._myGooglePage) 

What am I doing wrong?

thanks

+6
source share
4 answers

I will give you a hint ... start here:

https://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=JULIE%20NEWMAR

Where JULIE and NEWMAR are your search queries.

This will return the json data you need ... you need to parse this using json.load or simplejson.load to return a dict .. Then dive into it to first find responseData and then a list of results containing individual elements Whose URL you want to download.

Although I am in no way suggesting Google’s automatic curettage, as their (deprecated) API specifically says not to.

+3
source

This is the Python code that I use to search and download images from Google, hope this helps:

 import os import sys import time from urllib import FancyURLopener import urllib2 import simplejson # Define search term searchTerm = "hello world" # Replace spaces ' ' in search term for '%20' in order to comply with request searchTerm = searchTerm.replace(' ','%20') # Start FancyURLopener with defined version class MyOpener(FancyURLopener): version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11' myopener = MyOpener() # Set count to 0 count= 0 for i in range(0,10): # Notice that the start changes for each iteration in order to request a new set of images for each loop url = ('https://ajax.googleapis.com/ajax/services/search/images?' + 'v=1.0&q='+searchTerm+'&start='+str(i*4)+'&userip=MyIP') print url request = urllib2.Request(url, None, {'Referer': 'testing'}) response = urllib2.urlopen(request) # Get results using JSON results = simplejson.load(response) data = results['responseData'] dataInfo = data['results'] # Iterate for each result and get unescaped url for myUrl in dataInfo: count = count + 1 print myUrl['unescapedUrl'] myopener.retrieve(myUrl['unescapedUrl'],str(count)+'.jpg') # Sleep for one second to prevent IP blocking from Google time.sleep(1) 

You can also find very useful information here .

+6
source

I just answer it, even if it's old. there is a much easier way to do this.

 def google_image(x): search = x.split() search = '%20'.join(map(str, search)) url = 'http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=%s&safe=off' % search search_results = urllib.request.urlopen(url) js = json.loads(search_results.read().decode()) results = js['responseData']['results'] for i in results: rest = i['unescapedUrl'] return rest 

that's all.

0
source

Source: https://habr.com/ru/post/908660/


All Articles