Urllib2 does not return a full web page

Question

Urllib2 does not return a full web page

I am just starting out in Python and I am trying to request the source code of an html site using urllib2. However, when I try to get html content from the site, I do not get the full html content - there are no tags. I know that they are missing when I browse a site in firebug that appears in the code. Is this due to the way I request data - or because of the site? If so, can I get the full source code of the site in python and then parse it?

Currently, the code I'm using to request content and the site I'm trying to do is:

import urllib2 url = 'http://marinetraffic.com/ais/' response = urllib2.urlopen(url) html = response.read() print(html)

In particular, the content between - div id = "map_area" - is missing. Any help / pointers really appreciated!

+4

python web scraping

user1242670 Mar 01 '12 at 13:18

source share

2 answers

read in the handle returned by urlopen will return only what has already been loaded. This way you can get a short read. You are better off using urllib.urlretrieve() , which tries to extract the entire file, checks the Content-Length header and throws an error if it fails.

0

alexis Mar 01 '12 at 14:37

source share

plaes · Accepted Answer · 2012-03-01T13:23:49+0000

You get incomplete data because most of the content on this page is dynamically generated through Javascript ...

Urllib2 does not return a full web page

More articles: