Wget with sleep for a friendly workaround

How to download from a list of URLs and initiate pauses between downloads?

I have a list of urls in url.txt , for example.

 http://manuals.info.apple.com/cs_CZ/Apple_TV_2nd_gen_Setup_Guide_cz.pdf http://manuals.info.apple.com/cs_CZ/apple_tv_3rd_gen_setup_cz.pdf http://manuals.info.apple.com/cs_CZ/imac_late2012_quickstart_cz.pdf http://manuals.info.apple.com/cs_CZ/ipad_4th-gen-ipad-mini_info_cz.pdf http://manuals.info.apple.com/cs_CZ/iPad_iOS4_Important_Product_Info_CZ.pdf http://manuals.info.apple.com/cs_CZ/iPad_iOS4_Uzivatelska_prirucka.pdf http://manuals.info.apple.com/cs_CZ/ipad_ios5_uzivatelska_prirucka.pdf http://manuals.info.apple.com/cs_CZ/ipad_ios6_user_guide_cz.pdf http://manuals.info.apple.com/cs_CZ/ipad_uzivatelska_prirucka.pdf 

And I tried wget -i url.txt , but after a while it stops because the server detects an unfriendly scan.

How to pause between each url?

How can I do this with scrapy?

+5
source share
2 answers

wget

 wget --wait=10 --random-wait --input-file=url.txt 

scrapy

 scrapy crawl yourbot -s DOWNLOAD_DELAY=10 -s RANDOMIZE_DOWNLOAD_DELAY=1 
+7
source

You can add some delay between each request with the -w or --wait options.

  -w seconds or --wait=seconds 
+1
source

Source: https://habr.com/ru/post/1202865/


All Articles