Python: scrapy using proxy

I want to use proxy IP for web cleaning using scrapy. To use the proxy server, I set the environment variable http_proxyas indicated in the documentation.

$ export http_proxy=http://proxy:port

To check if the IP change works, I created a new spider with a name test:

from scrapy.spider import BaseSpider
from scrapy.contrib.spiders import CrawlSpider, Rule

class TestSpider(CrawlSpider):
    name = "test"
    domain_name = "whatismyip.com"
    start_urls = ["http://whatismyip.com"]

    def parse(self, response):
        print response.body
        open('check_ip.html', 'wb').write(response.body)

but if I run this spider, it check_ip.htmldoes not display IP, as indicated in the environment variable, rather, it shows the source IP address, as it was before the scan.

What is the problem? Is there an alternative way that I can check if I use a proxy server or not? or is there any other way to use proxy IP?

+4
source share
1 answer

settings.py , HttpProxyMiddleware:

DOWNLOADER_MIDDLEWARES = { 
 #you need this line in order to scrap through a proxy/proxy list
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
}
+2

Source: https://habr.com/ru/post/1540182/


All Articles