Python urllib over tor?

Code example:

#!/usr/bin/python import socks import socket import urllib2 socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, "127.0.0.1", 9050, True) socket.socket = socks.socksocket print urllib2.urlopen("http://almien.co.uk/m/tools/net/ip/").read() 

TOR runs the SOCKS proxy on port 9050 (default). The request goes through TOR, popping up at an IP address other than mine. However, the TOR console displays a warning:

"Feb 28 22: 44: 26.233 [warn] Your application (using socks4 up to port 80) only gives Tor an IP address. DNS-resolving applications themselves can leak information. Consider using Socks4A (for example, via privoxy or socat). For more information, see https://wiki.torproject.org/TheOnionRouter/TorFAQ#SOCKSAndDNS .

i.e. DNS queries do not go through a proxy server. But what the 4th parameter of setdefaultproxy should do is not it?

From http://socksipy.sourceforge.net/readme.txt :

setproxy (proxytype, addr [, port [, rdns [, username [, password]]]])

rdns is a logical flag, which changes the behavior of DNS resolution. If it is set to True, DNS resolution will be performed in advance on the server.

Same effect with selected PROXY_TYPE_SOCKS4 and PROXY_TYPE_SOCKS5.

It cannot be a local DNS cache (if urllib2 even supports this) because it happens when I change the URL of a domain that this computer has never visited before.

+16
python urllib2 tor socks
Feb 28 2018-11-22T00:
source share
3 answers

The problem is that httplib.HTTPConnection uses the socket create_connection helper function, which performs a DNS query using the regular getaddrinfo method before connecting the socket.

The solution is to make your own create_connection function and disarm it in the socket module before importing urllib2 , as is the case with the socket class.

 import socks import socket def create_connection(address, timeout=None, source_address=None): sock = socks.socksocket() sock.connect(address) return sock socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050) # patch the socket module socket.socket = socks.socksocket socket.create_connection = create_connection import urllib2 # Now you can go ahead and scrape those shady darknet .onion sites 
+17
Dec 17 '12 at 3:32
source share

The problem is that you are importing urllib2 before making a connection to the socks.

Try this instead:

 import socks import socket socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, '127.0.0.1', 9050, True) socket.socket = socks.socksocket import urllib2 print urllib2.urlopen("http://almien.co.uk/m/tools/net/ip/").read() 

Manual request example:

 import socks                                                         
 import urlparse                                                      

 SOCKS_HOST = 'localhost'                                             
 SOCKS_PORT = 9050                                                    
 SOCKS_TYPE = socks.PROXY_TYPE_SOCKS5                                 

 url = 'http://www.whatismyip.com/automation/n09230945.asp'           
 parsed = urlparse.urlparse (url)                                      


 socket = socks.socksocket ()                                          
 socket.setproxy (SOCKS_TYPE, SOCKS_HOST, SOCKS_PORT)                  
 socket.connect ((parsed.netloc, 80))                                  
 socket.send ('' 'GET% (uri) s HTTP / 1.1                                  
 host:% (host) s                                                       
 connection: close                                                    

 '' '% dict (                                                          
     uri = parsed.path,                                                 
     host = parsed.netloc,                                              
 )))                                                                   

 print socket.recv (1024)                                              
 socket.close ()
+4
Feb 28 2018-11-28T00:
source share

I published an article with full source code showing how to use urllib2 + SOCKS + Tor at http://blog.databigbang.com/distributed-scraping-with-multiple-tor-circuits/

Hope it solves your problems.

+3
Dec 16 '11 at 18:13
source share



All Articles