I need help setting up Tor in Ubuntu and using it as part of scrapy.
I did some research and figured out this guide:
class RetryChangeProxyMiddleware(RetryMiddleware): def _retry(self, request, reason, spider): log.msg('Changing proxy') tn = telnetlib.Telnet('127.0.0.1', 9051) tn.read_until("Escape character is '^]'.", 2) tn.write('AUTHENTICATE "267765"\r\n') tn.read_until("250 OK", 2) tn.write("signal NEWNYM\r\n") tn.read_until("250 OK", 2) tn.write("quit\r\n") tn.close() time.sleep(3) log.msg('Proxy changed') return RetryMiddleware._retry(self, request, reason, spider)
then use it in settings.py:
DOWNLOADER_MIDDLEWARE = { 'spider.middlewares.RetryChangeProxyMiddleware': 600, }
and then you just want to send requests through a local proxy (polipo), which can be done with
tsocks scrapy crawl spirder
Can anyone confirm that this method works and you get different IP addresses?
source share