Regex to cut image urls?

I need to extract a bunch of image URLs from a document in which images are associated with these names:

bellpepper = "http://images.com/bellpepper.jpg"
cabbage = "http://images.com/cabbage.jpg"
lettuce = "http://images.com/lettuce.jpg"
pumpkin = "http://images.com/pumpkin.jpg"

I assume that I can detect the start of the link with:

/http:[^ ,]+/i

But how can I get all the links separated from the document?

EDIT: To clarify the question: I just want to cross out the URLs from the file minus the variable name is equal to the sign and double quotes, so I have a new file, which is just a list of URLs, one per line.

+1
source share
4 answers

Try it...

(http://)([a-zA-Z0-9\/\\.])*
+1
source

If the format is constant then this should work (python):

import re
s = """bellpepper = "http://images.com/bellpepper.jpg" (...) """
re.findall("\"(http://.+?)\"", s)

: " " regexp, :)

+1

You want to say that you have this format in your document, and you just want to get the http part? you can just split the delimiter "=" without regex

$f = fopen("file","r");
if ($f){
    while( !feof($f) ){
        $line = fgets($f,4096);
        $s = explode(" = ",$line);
        $s = preg_replace("/\"/","",$s);
        print $s[1];
    }
    fclose($f);
}

on the command line:

#php5 myscript.php > newfile.ext

If you use languages ​​other than PHP, there is a similar line-splitting method that you can use. e.g. Python / Perl split (). read your document to find out

0
source

You can try this if your tool supports a positive lookbehind:

/(?<=")[^"\n]+/
0
source

Source: https://habr.com/ru/post/1712873/


All Articles