Infinite loop using a pair of Perl regular expressions

I wrote a small Perl script with regular expressions to get the HTML components of a website.

I know this is not a good way to do this job, but I tried to test my regular expression skills.

When launched with one of two regular expression patterns in a while loop, it works fine and displays the correct output. But when I try to check both patterns in a while loop, the second pattern matches every time, and the loop runs endlessly.

My script:

#!/usr/bin/perl -w use strict; while (<STDIN>) { while ( (m/<span class=\"itempp\">([^<]+)+?<\/span>/g) || (m/<font size=\"-1\">([^<]+)+?<\/font>/g) ) { print "$1\n"; } } 

I am testing the above script with sample input:

 <a href="http://linkTest">Link title</a> <span class="itempp">$150</span> <font size="-1"> (Location)</font> 

Required Conclusion:

 $150 (Location) 

Thanks! Any help would be greatly appreciated!

+6
source share
3 answers

Whenever a global regular expression does not match, it resets the position at which the search for the next global regular expression will begin. Therefore, when the first of your two patterns fails, it forces the second to look again from the beginning of the line.

This behavior can be disabled by adding the /c modifier, which leaves the position unchanged if the regular expression does not match.

In addition, you can improve your templates by removing escape characters ( " does not need to be escaped, and / does not need to be escaped if you select another separator) and the extra +? After capture.

Also use warnings much better than -w on the command line.

Here is the working version of your code.

 use strict; use warnings; while (<STDIN>) { while( m|<span class="itempp">([^<]+)</span>|gc or m|<font size="-1">([^<]+)</font>|gc ) { print "$1\n"; } } 
+9
source
 while (<DATA>) { if (m{<(?:span class="itempp"|font size="-1")>\s*([^<]+)}i) { print "$1\n"; } } __DATA__ <a href="http://linkTest">Link title</a> <span class="itempp">$150</span> <font size="-1"> (Location)</font> 
+3
source

You did not change $_ after or during the match, so it will always match and run in an infinite loop.

to fix this, you can add $_=$'; after print to match the rest of the line again.

-3
source

Source: https://habr.com/ru/post/921594/


All Articles