I would not create an element and would not use ImagePipeline.
import urllib import os import subprocess ... def start_requests(self): request = Request("http://webpagewithcaptchalogin.com/", callback=self.fill_login_form) return [request] def fill_login_form(self,response): x = HtmlXPathSelector(response) img_src = x.select("//img/@src").extract()
...
What I'm doing here, I import urllib.urlretrieve(url ) (to save the image), os.remove(file) (to delete the previous image) and subprocess.checoutput (to call the external command-line utility to solve the interceptor), All Scrapy infrastructure It is not used in this hack because the solution to this problem is always a hack.
All this challenging external subprocess could be better, but it works.
On some sites, it is not possible to save the captcha image, and you must call the page in a browser and call the screen_capture utility and crop in the exact place to “cut” the captcha. Now this is screen shielding.
source share