Imagine trying to parse the following html using a Perl regular expression:
<h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p> <h4>test</h4> <p>num1</p> <p>num2</p> <p>num3</p> <p>num4</p>
using the following regular expression:
<h4>([\w\s]*)</h4>(?:<p>([\w\s]+)</p>)+
How will groups be numbered in Perl? $ 1 will obviously contain the text of the <h4> , but when the capture groups are repeated, will the captured <p> tags be sent to $ 2 $ 3 and $ 4? Is there a good way to capture all the <p> tags in an array? Does this even support perl? Or am I forced to write one regex for <h4> , then another for <p> ?
(I know that I could use HTML::Tree or something similar to html parsing, but this is just a simplified example that I use to help describe this question, I'm really only interested in how re-numbered group capture work in Perl)
source share