How to extract regex HTML sources?

I need to extract the src element from all image tags in an HTML document.

So, the input is an HTML page, and the output will be a list of a URL pointing to images: ex ... http://www.google.com/intl/en_ALL/images/logo.gif

The following is what I came up with:

<img\s+src=""(http://.*?)

This does not work for tags where src is not located immediately after the img tag, for example:

<img height="1px" src="spacer.gif">

Can someone help fill out this regex? This is pretty easy, but I thought it might be a faster way to get an answer.

+3
source share
4 answers

The following regexp snippet should work.

<img[^>]+src="([^">]+)"

, <img, , >, src=". " >.

, HTML. .

+15

. HTML - , .

. , XML HTML ?

:

HTML-

+9

You can do this quite easily using Javascript. An example would look like this:

var images = document.getElementsByTagName("img");

for (i=0; i < images.length; i++)
{
   // get image src
   var currImage = images[i].src;

   // do link creation here
} 
+4
source

This works great for me.

$regexp = '<img[^>]+src=(?:\"|\')\K(.[^">]+?)(?=\"|\')';

if(preg_match_all("/$regexp/", $content, $matches, PREG_SET_ORDER)) {

    if( !empty($matches) ) {

        for ($i=0; $i <= count($matches); $i++)

        {
            $img_src = $matches[$i][0];

            echo $img_src;

        }

    }

}
0
source

Source: https://habr.com/ru/post/1711012/


All Articles