How to get a regex to start at the beginning of a line

This is an unusual problem that I encountered (and probably saw it before, but did not pay attention to it).

Here's the gist of the code:

my $url = 'http://twitter.com/' . $handle; my $page = get($url); if($page =~ m/Web<\/span>\s*<a href=\"(.+?)\"/gi) { $website = $1; } if($page =~ m/follower_count\" class=\"stats_count numeric\">(.+?)\s*</g) { $num_followers = $1; } 

It gets the twitter url and performs some regular expression to capture # followers and the user's website. This code is working fine. But when you switch the order and look for the site AFTER you are looking for a follower, the website becomes empty. As it turned out, with regular expression, the string seems to save the location where the last match was made. In html, the number of tracking elements appears after the website is displayed. If you first do regular expression of the follower counter, he, like him, will run the regular expression of the website where the follower counter is left (for example, an index link to a string).

What puzzled me was that I had the “g” operator at the end, meaning “global,” as in “searching for a line globally ... from the very beginning”.

Am I missing something? I cannot understand why it resumes the last position of the regular expression in the string (if that makes sense).

+4
source share
4 answers

The /g modifier in a scalar context does not do what you think. Get rid of him.

As perlretut explains, /g in scalar context loops for each match in turn. It is intended for use in a loop, for example:

 while ($str =~ /pattern/g) { # match on each occurence of 'pattern' in $str in turn } 

Another way to use /g is in a list context:

 my @results = $str =~ /pattern/g; # collect each occurence of 'pattern' within $str into @results 

If you use /g in a scalar context and you do not repeat it, you almost certainly do not use it correctly.

+12
source

To quote perlop on Regexp Quote Like Operators :

In a scalar context, each m//g execution finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the pos() function; see pos . An incorrect match usually resets the search position to the beginning of the line, but you can avoid this by adding the /c modifier (for example, m//gc ). Changing the target line also resets the search position.

So, in the scalar context (which you are using), /g does not mean “search from the very beginning”, it means “search starting with the line pos ”. Search from the beginning by default (without /g ).

/g usually used when you want to find all matches for a regular expression in a string, and not just the first match. In the context of the list, it does this by returning a list of all matches. In a scalar context, it does this by starting the search from which the previous search was stopped (usually performed in a loop).

+5
source

Its essence is that matches made with / g keep the position of the last match, so the next time the line is matched, the regular expression will start from there. In a scalar context, this is usually done to get multiple consecutive matches in a while loop; In the context of the list, / g returns all consistent (but not overlapping) results. You can learn more about this on perlretut , in the Global Matching and perlop section under Regexp-Quote-Like-Operators.

You can see the current position using the pos function. You can also set the position using pos as lvalue: pos($string) = 0; will reset the position to the beginning of the line.

It makes no sense to use / g in a scalar context outside the loop, since you can get the same functionality using the \ G statement.

.. of course, then no one remembers how \ G works, and you returned to the square, but that's another topic.

+2
source

m // g do not reset position. You need to do it manually. See This For Reference: http://perldoc.perl.org/functions/pos.html

I believe that you just set pos to 0 or undef and it will work.

0
source

Source: https://habr.com/ru/post/1334112/


All Articles