Python urllib.urlopen IOError

Question

Python urllib.urlopen IOError

So, I have the following lines of code in a function

sock = urllib.urlopen(url)
html = sock.read()
sock.close()

and they work fine when I call the function manually. However, when I call the function in a loop (using the same URLs as before), I get the following error:

> Traceback (most recent call last):
  File "./headlines.py", line 256, in <module>
    main(argv[1:])
  File "./headlines.py", line 37, in main
    write_articles(headline, output_folder + "articles_" + term +"/")
  File "./headlines.py", line 232, in write_articles
    print get_blogs(headline, 5)
  File "/Users/michaelnussbaum08/Documents/College/Sophmore_Year/Quarter_2/Innovation/Headlines/_code/get_content.py", line 41, in get_blogs
    sock = urllib.urlopen(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 87, in urlopen
    return opener.open(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 203, in open
    return getattr(self, name)(url)
  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib.py", line 314, in open_http
    if not host: raise IOError, ('http error', 'no host given')
IOError: [Errno http error] no host given

Any ideas?

Change code:

def get_blogs(term, num_results):
    search_term = term.replace(" ", "+")
    print "search_term: " + search_term
    url = 'http://blogsearch.google.com/blogsearch_feeds?hl=en&q='+search_term+'&ie=utf-8&num=10&output=rss'
    print "url: " +url  

    #error occurs on line below

    sock = urllib.urlopen(url)
    html = sock.read()
    sock.close()

def write_articles(headline, output_folder, num_articles=5):

    #calls get_blogs

    if not os.path.exists(output_folder):
    os.makedirs(output_folder)

    output_file = output_folder+headline.strip("\n")+".txt"
    f = open(output_file, 'a')
    articles = get_articles(headline, num_articles)
    blogs = get_blogs(headline, num_articles)


    #NEW FUNCTION
    #the loop that calls write_articles
    for term in trend_list: 
        if do_find_max == True:
        fill_search_term(term, output_folder)
    headlines = headline_process(term, output_folder, max_headlines, do_find_max)
    for headline in headlines:
    try:
        write_articles(headline, output_folder + "articles_" + term +"/")
    except UnicodeEncodeError:
        pass

+3

python urllib

Michael Apr 20 '10 at 3:19

source share

3 answers

I had this problem when the variable i was concatenating with the url in your case search_term

url = 'http://blogsearch.google.com/blogsearch_feeds?hl=en&q='+search_term+'&ie=utf-8&num=10&output=rss'

there was a newline at the end. So make sure you do

search_term = search_term.strip()

You may also want to do

search_term = urllib2.quote(search_term)

, URL-

+6

user1994702 28 . '13 10:16

, urlopen, , :

print(url)
sock = urllib.urlopen(url)

That way, when you run the script and get an IOError, you will see urlwhich causes the problem. The "no host given" error can be replicated if it urlis equal to something like 'http://'...

+1

unutbu Apr 20 '10 at 3:25

source share

Eddy pronk · Accepted Answer · 2010-04-20T03:36:38+0000

use urllib2 instead if you don't want to handle block-based reads yourself. This probably does what you expect.

import urllib2
req = urllib2.Request(url='http://stackoverflow.com/')
f = urllib2.urlopen(req)
print f.read()

Python urllib.urlopen IOError

More articles: