How to use regular expressions in wget to reject files?

I am trying to load the contents of a website using the wget tool. I used the -R option to reject some file types. but there are other files that I don’t want to download. These files are named as follows and do not have any extensions.

string-ID 

eg:

 newsbrief-02 

How can I tell wget not to download these files (files whose names begin with the specified line)?

+6
source share
2 answers

You cannot specify a regular expression in the wget -R key, but you can specify a pattern (for example, a file pattern in the shell).

The answer looks like this:

 $ wget -R 'newsbrief-*' ... 

Can you also use ? and character classes [] .

For more information see info wget .

+4
source

Since (apparently) v1.14 wget accepts regular expressions: --reject-regex and --accept-regex (with --regex-type posix by default, can be installed in pcre if compiled with libpcre support).

Remember that you can use --reject-regex only once per wget call. That is, you must use | in one regex if you want to select multiple regexes:

 wget --reject-regex 'expr1|expr2|…' http://example.com 
+18
source

Source: https://habr.com/ru/post/919165/


All Articles