Urllib2 does not return a full web page

I am just starting out in Python and I am trying to request the source code of an html site using urllib2. However, when I try to get html content from the site, I do not get the full html content - there are no tags. I know that they are missing when I browse a site in firebug that appears in the code. Is this due to the way I request data - or because of the site? If so, can I get the full source code of the site in python and then parse it?

Currently, the code I'm using to request content and the site I'm trying to do is:

import urllib2 url = 'http://marinetraffic.com/ais/' response = urllib2.urlopen(url) html = response.read() print(html) 

In particular, the content between - div id = "map_area" - is missing. Any help / pointers really appreciated!

+4
source share
2 answers

You get incomplete data because most of the content on this page is dynamically generated through Javascript ...

+4
source

read in the handle returned by urlopen will return only what has already been loaded. This way you can get a short read. You are better off using urllib.urlretrieve() , which tries to extract the entire file, checks the Content-Length header and throws an error if it fails.

0
source

Source: https://habr.com/ru/post/1399184/


All Articles