Python - urlretrieve for an entire web page

with urllib.urlretrieve('http://page.com', 'page.html') I can save the index page and only the index of the .com page. Can urlretrieve handle something similar to wget-r, which allows me to load the whole structure of the web page with all the html files on the .com page?

Hi

+4
source share
1 answer

Not directly.

If you want to deploy the entire site, look at the mechanization: http://wwwsearch.sourceforge.net/mechanize/

This will allow you to load the page and follow its links.

Sort of:

 import mechanize br = mechanize.Browser() br.open('http://stackoverflow.com') for link in br.links(): print(link) response = br.follow_link(link) html = response.read() #save your downloaded page br.back() 

In its current form, this will only result in pages being removed from your starting point. You can easily adapt it to cover the entire site.

If you really want to reflect the whole site, use wget. Doing this in python is only worth it if you need to do some kind of smart processing (handling javascript, selectively following links, etc.)

0
source

Source: https://habr.com/ru/post/1403939/


All Articles