The task here is to extract all the links where there may be several lines, otherwise you could just do:
" Extract all lines with href= :g/href="[^"]\+"/w >> list_of_links.txt " Open the new file :e list_of_links.txt " Extract the bit inside the quotation marks :%s/.*href="\([^"]\+\)".*/\1/
The simplest approach would probably be to do this:
" Save as a new file name :saveas list_of_links.txt " Get rid of any lines without href= :g!/href="\([^"]\+\)"/d " Break up the lines wherever there is a 'href=' :%s/href=/\rhref=/g " Tidy up by removing everything but the bit we want :%s/^.*href="\([^"]\+\)".*$/\1/
Alternatively (on a similar topic)
:g/href="[^"]\+"/w >> list_of_links.txt :e list_of_links.txt :%s/href=/\rhref=/g :%s/^.*href="\([^"]\+\)".&$/\1/
(see: help saveas ,: help: vglobal ,: help: s)
However, if you really wanted to do it in a more direct way, you could do something like this:
" Initialise register 'h' :let @h = "" " For each line containing href=..., get the line, and carry out a global search " and replace that extracts just the URLs and a double quote (as a delimiter) :g/href="[^"]\+"/let @h .= substitute(getline('.'), '.\{-}href="\([^"]\+\)".\{-}\ze\(href=\|$\)', '\1"', 'g') " Create a new file :new " Paste the contents of register h (entered in normal mode) "hp " Replace all double quotes with new-lines :s/"/\r/g " Save :w
Finally, you can do this in a function with a for loop, but I will leave it for someone else to write!
source share