I'm trying to make a website mirror, but the URLs include several paths that overlap when copying to files on disk in the usual way wget. The problem manifests itself in URLs such as http://example.com/newsand http://example.com/news/article1.
Wget loads these URLs as /newsand /news/article1, but this means that the file is /newsoverwritten by the folder with the same name.
For a proper static mirror, these two URLs will need to load instead of /news/index.htmland /news/article1.
I tried to work around this problem by running wgettwice and moving the files accordingly, but this did not work for me. The path /newshas links to /news/article1which you need to convert. I use the option -kto convert links, but if I run wgetit twice , it does not convert links between these unrelated downloaded files.
Here is my command:
wget -p -r -l4 -k -d -nH http://example.com
Here is an example of working around what I tried:
wget -p -r -l1 -k -nH http://example.com
mv news /tmp/news.html
wget -p -r -l4 -k -nH http://example.com
mv /tmp/news.html news/index.html
In the above example, the links on the page /newsthat should point to /news/article1were not converted.
Does anyone know how to get around this with wget? Is there any other tool that will work better?
source
share