Attempting to access the Internet using urllib2 in Python

I am trying to write a program that (among other things) will receive text or source code from a predefined website. I am learning Python for this, and most sources told me to use urllib2 . As a test, I tried this code:

 import urllib2 response = urllib2.urlopen('http://www.python.org') html = response.read() 

Instead of acting in any expected way, the shell just sits there, as it waits for input. Not even " >>>" or " ... ". The only way to get out of this state is by using [ctrl] + c. When I do this, I get a whole bunch of error messages like

 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/m/mls/pkg/ix86-Linux-RHEL5/lib/python2.5/urllib2.py", line 124, in urlopen return _opener.open(url, data) File "/m/mls/pkg/ix86-Linux-RHEL5/lib/python2.5/urllib2.py", line 381, in open response = self._open(req, data) 

I would appreciate any feedback. Is there a tool other than urllib2 to use, or can you give tips on how to fix this. I use a network computer at my work, and I'm not quite sure how the shell is configured or how it can affect anything.

+1
source share
4 answers

With a probability of 99.999%, this is a proxy problem. Python is incredibly bad at finding the correct HTTP proxy to use, and when it cannot find the right one, it just freezes and eventually shuts down.

So, first you need to find out which proxy server to use, check your browser settings (Tools β†’ Internet Options β†’ Connections β†’ LAN Settings ... in IE, etc.). If it uses a script for autoconfiguration, you will need to get a script (which should be some kind of javascript) and find out where your request should go. If no script is specified, and the "automatically detect" option is checked, you can simply ask your IT specialist for an IT specialist.

I assume you are using Python 2.x. From Python urllib on urllib :

 # Use http://www.someproxy.com:3128 for http proxying proxies = {'http': 'http://www.someproxy.com:3128'} filehandle = urllib.urlopen(some_url, proxies=proxies) 

Note that the point in the ProxyHandler that calculates the default values ​​is what happens already when using urlopen , so this probably won't work.

If you really want urllib2, you need to specify a ProxyHandler, as an example in this page . Authentication may or may not be required (usually not).

+3
source

This is not a good answer to the question "How to do this with urllib2", but let me suggest python-requests . The whole reason it exists is because the author found urllib2 a cumbersome mess. And he is probably right.

+2
source

This is very strange, have you tried a different url?
Otherwise, HTTPLib , however, is more complicated. Here is your example using HTTPLib

 import httplib as h domain = h.HTTPConnection('www.python.org') domain.connect() domain.request('GET', '/fish.html') response = domain.getresponse() if response.status == h.OK: html = response.read() 
0
source

I get 404 error almost immediately (without freezing):

 >>> import urllib2 >>> response = urllib2.urlopen('http://www.python.org/fish.html') Traceback (most recent call last): ... urllib2.HTTPError: HTTP Error 404: Not Found 

If I try to contact an address on which the HTTP server does not work, it hangs for a long time until a timeout occurs. You can shorten it by passing a timeout parameter to urlopen :

 >>> response = urllib2.urlopen('http://cs.princeton.edu/fish.html', timeout=5) Traceback (most recent call last): ... urllib2.URLError: <urlopen error timed out> 
0
source

Source: https://habr.com/ru/post/1482965/


All Articles