Testing HTTPS proxy in python

I manage many HTTPS proxies (these proxies that have their own SSL connection). I am creating a diagnostic tool in python that tries to connect to a page through each proxy and send me an email if it cannot connect through one of them.

The way I decided to do this is to use urllib to connect through each proxy and return a page that should say β€œsuccess” with the code below.

def fetch(url): connection = urllib.urlopen( url, proxies={'http':"https://"+server+':443'} ) return connection.read() print fetch(testURL) 

This selection on the page, I want the problem is that it will still find the page I need, even if the proxy server information is incorrect or the proxy server is inactive. Thus, either he never uses a proxy server, or tries to connect without him when he fails.

How can i fix this?

Edit: No one knows how to do this. I'm going to start reading through libraries of other languages ​​to see if they can handle it better. Does anyone know if this is easier in another language like Go?

Edit: I wrote this only in the comment below, but I think it might be a misunderstanding. "The proxy server has its own ssl connection. Therefore, if I go to google.com, I will first do a key exchange with foo.com, and then another with the destination address bar.com or the destination address baz.com. The destination should not be https, proxy - https "

+5
source share
4 answers

Most people understand the https proxy server as a proxy server that understands the CONNECT request . My example creates a direct ssl connection.

 try: import http.client as httplib # for python 3.2+ except ImportError: import httplib # for python 2.7 con = httplib.HTTPSConnection('proxy', 443) # create proxy connection # download http://example.com/ through proxy con.putrequest('GET', 'http://example.com/', skip_host=True) con.putheader('Host', 'example.com') con.endheaders() res = con.getresponse() print(res.read()) 

If your proxy is a reverse proxy, change

 con.putrequest('GET', 'http://example.com/', skip_host=True) 

to

 con.putrequest('GET', '/', skip_host=True)` 
+2
source

I assume that it does not work for https requests. It's right? If yes, then the above code defines proxies only for http. Try adding it for https:

proxies={'https':"https://"+server+':443'}

Another option is to use requests python module instead of urllib . Take a look at http://docs.python-requests.org/en/latest/user/advanced/#proxies

+1
source

urllib doesn't seem to support this since reading the code, and it's unclear if urllib2 works. But what about using curl (or curllib), as a rule, it is an api HTTP client (more complex, namely why urllib, etc.).

Looking at the curl command line tool seems promising:

  -x, --proxy <[protocol://][user: password@ ]proxyhost[:port]> Use the specified HTTP proxy. If the port number is not specified, it is assumed at port 1080. This option overrides existing environment variables that set the proxy to use. If there an environment variable setting a proxy, you can set proxy to "" to override it. All operations that are performed over an HTTP proxy will transparently be converted to HTTP. It means that certain protocol specific operations might not be available. This is not the case if you can tunnel through the proxy, as one with the -p, --proxytunnel option. User and password that might be provided in the proxy string are URL decoded by curl. This allows you to pass in special characters such as @ by using %40 or pass in a colon with %3a. The proxy host can be specified the exact same way as the proxy environment variables, including the protocol prefix (http://) and the embedded user + password. From 7.21.7, the proxy string may be specified with a protocol:// prefix to specify alternative proxy protocols. Use socks4://, socks4a://, socks5:// or socks5h:// to request the specific SOCKS version to be used. No protocol specified, http:// and all others will be treated as HTTP proxies. If this option is used several times, the last one will be used. 
+1
source

How to use timeout? If the proxy server does not connect after 30 seconds, it should be marked as not connected.

 def fetch(url, server): proxy_handler = urllib2.ProxyHandler({'http':'https://'+server+':443'}) opener = urllib2.build_opener(proxy_handler, urllib2.HTTPHandler(debuglevel=0)) urllib2.install_opener(opener) try: response = opener.open( url, timeout = 30) return response.read() except: print "Can't connect with proxy %s" % (server) print fetch(url,serverIp) 

You can change debuglevel = 1 to view connection details

I use this for global proxies and with my internet connection 30 seconds is the maximum waiting time to find out if I am connected or not. In my tests, if the connection is longer than 30 seconds, this has always been a mistake.

0
source

Source: https://habr.com/ru/post/1201786/


All Articles