PHP RegEx Group Multiple Matches

I'm just trying to work on my very first regular expression. I want to be able to map a pseudo HTML element and retrieve useful information such as tag name, attributes, etc .:

$string = '<testtag alpha="value" beta="xyz" gamma="abc" >'; if (preg_match('/<(\w+?)(\s\w+?\s*=\s*".*?")+\s*>/', $string, $matches)) { print_r($matches); } 

In addition, I get:

 Array ( [0] => [1] => testtag [2] => gamma="abc" ) 

Does anyone know how I can get other attributes? What am I missing?

0
php regex
Jul 06 '09 at 15:46
source share
3 answers

Try this regex:

 /<(\w+)((?:\s+\w+\s*=\s*(?:"[^"]*"|'[^']*'|[^'">\s]*))*)\s*>/ 

But you really shouldn't use regular expressions for a context free language like HTML. Use a real parser instead.

+2
Jul 06 '09 at 15:50
source share

As already mentioned, they do not use RegEx to parse HTML documents .

Try using this PHP parser: http://simplehtmldom.sourceforge.net/

+1
Jul 06 '09 at 17:57
source share

The second capture group matches the attributes one at a time, each time overwriting the previous one. If you used .NET regular expressions, you can use the Captures array to take individual snapshots, but I don't know any other regular expressions that have this function. Usually you need to do something like capturing all the attributes in one group, and then use another regular expression for the captured text to split the individual attributes.

That's why people tend to either love regular expressions or hate them (or both). You can do truly amazing things with them, but you also perform simple tasks like this, which are ridiculously difficult, if not impossible.

0
Jul 06 '09 at 18:01
source share



All Articles