How do you archive an entire site for offline viewing?

Question

How do you archive an entire site for offline viewing?

In fact, we have burned static / archived copies of our asp.net sites for clients many times. We have used WebZip so far, but we have had endless problems with crashes, loaded pages that were not re-linked correctly, etc.

We basically need an application that scans and downloads static copies of everything on our asp.net website (pages, images, documents, CSS, etc.), and then processes the downloaded pages so that they can be viewed locally without an Internet connection (get rid of absolute URLs in links, etc.). The more idiotic evidence, the better. This seems like a fairly common and (relatively) simple process, but I tried several other applications and was really not impressed

Does anyone have archiving software that they would recommend? Does anyone have a really simple process to share?

+49

html web-crawler archive

jskunkle Feb 11 '09 at 21:22

source share

10 answers

You can use wget :

wget -m -k -K -E http://url/of/web/site

+64

chuckg Feb 11 '09 at 21:25

source share

Hartator's Wayback Machine Downloader is simple and fast.

Install through Ruby, then run with the desired domain and additional timestamp from the Internet archive .

 sudo gem install wayback_machine_downloader mkdir example cd example wayback_machine_downloader http://example.com --timestamp 19700101000000

+5

jtheletter Nov 02 '15 at 1:07

source share

I use Blue Crab on OSX and WebCopier on Windows.

+4

Syntax Feb 11 '09 at 21:26

source share

wget -r -k

... and explore the rest of the options. I hope you followed these recommendations: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html so that all your resources are safe with GET requests.

+2

Joel Hoffman Feb 11 '09 at 21:26

source share

I just use: wget -m <url> .

+1

Aram Verstegen Feb 11 '09 at 21:25

source share

For users of OS X, I found that the sitesucker application found here works without any settings, except for the links.

+1

user1011743 Apr 24 '13 at 14:24

source share

If your customers archive compliance issues, you want to make sure that the content can be authenticated. The options listed are for easy viewing, but they are not legal. In this case, you are looking for timestamps and digital signatures. It is much more difficult if you do it yourself. I would suggest a service like PageFreezer .

+1

Dieghito Mar 09 '15 at 18:23

source share

I have been using HTTrack for several years now. It processes all interstitial links, etc. Just great. My only complaint is that I did not find a good way to keep it a limited sub-site very well. For example, if there is a site www.foo.com/steve that I want to archive, it will most likely follow the links to www.foo.com/rowe and archive it too. Otherwise it's great. High configuration and reliability.

0

Steve Rowe Feb 11 '09 at 21:58

source share

Also check out ArchiveBox (formerly Bookmark Archiver).

This is a local open source web archiving program that can download bookmarks, browser history, RSS feeds, etc.

0

Nick Sweeting Dec 21 '18 at 23:31

source share

Jesse Dearing · Accepted Answer · 2009-02-11 21:34

On Windows, you can watch HTTrack . It is highly customizable, allowing you to set the download speed. But you can just point it to the website and run it too without any configuration.

In my experience, it was a really good tool and works well. Some of the things I like about HTTrack are:

Open source license
Summary of stopped downloads
Can update an existing archive
You can configure it to be non-aggressive when it loads so that it does not lose the bandwidth and bandwidth of the site.

How do you archive an entire site for offline viewing?

More articles: