Save full web page

Question

Save full web page

I encountered a problem while working on a project. I want to “crawl” certain sites of interest and save them as a “full web page”, including styles and images, to create a mirror for them. I had to add bookmarks to the site several times to read it later, and after a few days the site was unavailable because it was hacked and the owner did not have a backup copy of the database.

Of course, I can read files with php very easily with fopen("http://website.com", "r")or fsockopen(), but the main goal is to save full web pages, so if it goes down, it will still be available to others, for example, a “time programming machine :)

Is there a way to do this without reading and saving each link on the page?

Objective-C solutions are also welcome, as I am also trying to understand it.

Thanks!

+3

php objective-c

Bogdan constantinescu Nov 12 '09 at 14:07

source share

5 answers

?

: .

: -, - - - .

, Linux wget, - , .

- , . , - stop if different domain !

+3

MalphasWats 12 . '09 14:20

Objective-C, WebArchive Webkit.
API, - .webarchive. ( Safari, -).

-:

( css, , )
QuickLook

+1

Thomas Zoechling 12 . '09 14:40

, ( , ), : , , . -.

, , wget? Unix- , , . , , , , ( ).

0

Sixten Otto 12 . '09 14:20

, " -" - , , Windows - Teleport Pro SiteCrawler Mac.

0

3zzy 12 . '09 14:23

K.P. · Accepted Answer · 2009-11-12T14:20:50+0000

You really need to parse html and all referenced css files, which is NOT easy. However, a quick way to do this is to use an external tool like wget. After installing wget, you can run from the command line wget --no-parent --timestamping --convert-links --page-requisites --no-directories --no-host-directories -erobots=off http://example.com/mypage.html

This will load mypage.html and all related css files, images and images linked inside css. After installing wget on your system, you can use the php function system()to control wget programmatically.

NOTE. You need at least wget 1.12 to correctly save images that are links through css files.

Save full web page

More articles: