Repeating numbered capture groups in Perl

Imagine trying to parse the following html using a Perl regular expression:

<h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p> <h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p> <p>num4</p> 

using the following regular expression:

 <h4>([\w\s]*)</h4>(?:<p>([\w\s]+)</p>)+ 

How will groups be numbered in Perl? $ 1 will obviously contain the text of the <h4> , but when the capture groups are repeated, will the captured <p> tags be sent to $ 2 $ 3 and $ 4? Is there a good way to capture all the <p> tags in an array? Does this even support perl? Or am I forced to write one regex for <h4> , then another for <p> ?

(I know that I could use HTML::Tree or something similar to html parsing, but this is just a simplified example that I use to help describe this question, I'm really only interested in how re-numbered group capture work in Perl)

+4
source share
1 answer

When you repeat a capture group, only the last matching group will be saved in matches.

If you want to get each match from a repeating group, you can use replaceAll with a callback function or repeat each match one after another.

Most languages ​​also have a β€œmatch all” that I don’t know how to do in perl. This usually saves all matches in an array for you, but duplicate groups are still only saved as the last matched group.

+3
source

Source: https://habr.com/ru/post/1483219/


All Articles