Save full web page

I encountered a problem while working on a project. I want to “crawl” certain sites of interest and save them as a “full web page”, including styles and images, to create a mirror for them. I had to add bookmarks to the site several times to read it later, and after a few days the site was unavailable because it was hacked and the owner did not have a backup copy of the database.

Of course, I can read files with php very easily with fopen("http://website.com", "r")or fsockopen(), but the main goal is to save full web pages, so if it goes down, it will still be available to others, for example, a “time programming machine :)

Is there a way to do this without reading and saving each link on the page?

Objective-C solutions are also welcome, as I am also trying to understand it.

Thanks!

+3
source share
5 answers

You really need to parse html and all referenced css files, which is NOT easy. However, a quick way to do this is to use an external tool like wget. After installing wget, you can run from the command line wget --no-parent --timestamping --convert-links --page-requisites --no-directories --no-host-directories -erobots=off http://example.com/mypage.html

This will load mypage.html and all related css files, images and images linked inside css. After installing wget on your system, you can use the php function system()to control wget programmatically.

NOTE. You need at least wget 1.12 to correctly save images that are links through css files.

+16

?

: .

: -, - - - .

, Linux wget, - , .

- , . , - stop if different domain !

+3

Objective-C, WebArchive Webkit.
API, - .webarchive. ( Safari, -).

-:

  • ( css, , )
  • QuickLook
+1

, ( , ), : , , . -.

, , wget? Unix- , , . , , , , ( ).

0

, " -" - , , Windows - Teleport Pro SiteCrawler Mac.

0

Source: https://habr.com/ru/post/1722690/


All Articles