Extract template from curl output

Question

Extract template from curl output

I would like to use curl on the command line to grab a URL, pass it to a pattern, and return a list of URLs matching that pattern.

I am encountering problems with the greedy aspects of the template and it seems like I can't get past it. Any help on this would be appreciated.

curl http://www.reddit.com/r/pics/ | grep -ioE "http://imgur\.com/.+(jpg|jpeg|gif|png)"

So, take the data from the URL that returns the html mess that may be needed for some lines, somehow replaced, without a regular expression, it can return more than one pattern in one line. The pattern is pretty simple, any string that matches ...

starts at http://imgur.com/
has AZ az 0-9 (maybe some others) and still has 5 characters, 8 should cover it forever if I would like to limit this aspect of the pattern, which I don't
ends in .grraphic_file_format_extention (jpg, jpeg, gif, png)

Thats about it, with this url, with default settings, I should usually get a good set of images. I would not mind using the URL of an RSS message for the same page, it might be easier to parse in fact.

Thanks everyone!

Edit Thanks for the quick response, my last command is now:

$curl -s http://www.reddit.com/r/pics/ | grep -ioE "http:\/\/imgur\.com\/.{1,10}\.(jpg|jpeg|gif|png)"

+3

regex curl

user170579 Apr 25 '10 at 4:14

source share

1 answer

Ben · Accepted Answer · 2010-04-25T04:24:05+0000

Try:

http:\/\/imgur\.com\/.{5,8}\.(jpg|jpeg|gif|png)

Extract template from curl output

More articles: