Optimize web page cleanup with wget

I use wget to download a huge list of web pages (about 70,000). I have to sleep about 2 seconds between consecutive wget.This takes a huge amount of time. Something like 70 days. What I would like to do is to use a proxy so that I can significantly speed up the process. I am using a simple bash script for this process. All suggestions and comments are appreciated.

+3
source share
1 answer

The first suggestion is not to use Bash or wget. I would use Python and Beautiful Soup. Wget is not really meant to clear the screen.

Interchange the load on multiple machines by running part of your list on each machine.

, - , script .

+3

Source: https://habr.com/ru/post/1794010/


All Articles