Problem with the -N wget option

I am trying to clear a website using wget. Here is my command:

wget -t 3 -N -k -r -x

"-N" means "do not download the file if the server version is older than the local version." But that does not work. The same files are downloaded again and again when I restart the above cleanup operation - even if the files do not have any changes.

Many of the loaded pages report:

The last modified title is missing - timestamps are disabled.

I tried to clean some websites, but all attempts still give this problem.

Is this a situation managed by a remote server? Do they choose the wrong time header tags? If so, can I not do much?

I know the -NC option (without clobber), but this will prevent overwriting the existing file, even if the server file is newer, which leads to the accumulation of old data.

Thanks drew

+3
source share
1 answer

The switch wget -Nworks, but many web servers do not send the Last-Modified header for various reasons. For example, dynamic pages (PHP or any CMS, etc.) should actively implement the functionality (find out when the content was last modified, and send a header). Some do, and some do not.

There really is no other reliable way to check if a file has been modified.

+1
source

Source: https://habr.com/ru/post/1791421/


All Articles