Regex matches 2 html tags in 1 html file

I have an HTML file that contains the following:

<img src="MATCH1" bla="blabla">
<something:else bla="blabla" bla="bla"><something:else2 something="something">
<something image="MATCH2" bla="abc">

Now I need a regular expression to match both MATCH1 and MATCH2

HTML also contains several parts like this, so it can be in HTML 1, 2, 3 times x times.

When I speak:

<img\s*src="(.*?)".*?<something\s*image="(.*?)"

This is not appropriate. What am I missing here?

Thanks in advance!

+3
source share
2 answers

Regex does not always provide excellent results when parsing HTML.

I think you should do it using HTML DOM Parser

Example:

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// OR Create a DOM object from a HTML file
$html = file_get_html('test.htm');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';

There are filters for retrieving tags with specific attributes:

[] , .

[ = ] .

[!= ] .

[attribute ^ = value] .

[ $= ] .

[ * = ] .


HTML, .

+10

, , . . , .

phpQuery QueryPath :

qp($html)->find("img")->attr("src");

, :

preg_match('#<img[^>]+src="([^">]*)".+?<something\s[^>]*image="([^">]*)"#ims', $html, $m);

, .

+2

Source: https://habr.com/ru/post/1782066/


All Articles