Using Tor proxies with scrapy

I need help setting up Tor in Ubuntu and using it as part of scrapy.

I did some research and figured out this guide:

class RetryChangeProxyMiddleware(RetryMiddleware): def _retry(self, request, reason, spider): log.msg('Changing proxy') tn = telnetlib.Telnet('127.0.0.1', 9051) tn.read_until("Escape character is '^]'.", 2) tn.write('AUTHENTICATE "267765"\r\n') tn.read_until("250 OK", 2) tn.write("signal NEWNYM\r\n") tn.read_until("250 OK", 2) tn.write("quit\r\n") tn.close() time.sleep(3) log.msg('Proxy changed') return RetryMiddleware._retry(self, request, reason, spider) 

then use it in settings.py:

 DOWNLOADER_MIDDLEWARE = { 'spider.middlewares.RetryChangeProxyMiddleware': 600, } 

and then you just want to send requests through a local proxy (polipo), which can be done with

 tsocks scrapy crawl spirder 

Can anyone confirm that this method works and you get different IP addresses?

+6
source share
1 answer

I used this snippet: http://snipplr.com/view/66992/use-a-random-user-agent-for-each-request/

Update: fixed incorrect link

0
source

Source: https://habr.com/ru/post/921088/


All Articles