Perl regex - How to make it less greedy?

How to count the number of empty "fields" in the next line? Empty fields are indicated by -| or |-| or |- Does the regular expression that I prepared seem to work, unless I have consecutive empty fields? How to make him less greedy?

 my $string = 'P|CHNA|string-string|-|-|25.75|-|2562000|-0.06'; my $count = () = ($string=~/(?:^-\||\|-$|\|-\|)/g); printf("$count\n"); 

The above code prints 2 instead of 3 which I want.

+6
source share
3 answers

The trick is to use images. The first attempt may be as follows:

 my $count = () = $string =~ / (?<\|) # Preceded by "|" (-) (?=\|) # Followed by "|" /xg; 

But that does not work. The problem with the above is that it does not detect that the first field or the last field is empty. Two ways to fix this:

 my $count = () = "|$string|" =~ / (?<\|) # Preceded by "|" (-) (?=\|) # Followed by "|" /xg; 

or

 my $count = () = $string =~ / (?<![^|]) # Not preceded by a char other than "|" (-) (?![^|]) # Not followed by a char other than "|" /xg; 
+2
source

I would completely avoid the regex route and instead treat this as a list because it is one:

 my $count = grep { /^-$/ } split /\|/, $string; 
+7
source

The problem actually has nothing to do with greed / laziness (which applies only to repetition operators like * or + ).

The problem is two empty fields next to each other: |-|-| . The first one is mapped, but then the second one fails because opening | already in use, but since you have a beining-of-line token in the rule ^-| he does not fit this.

I think a much simpler approach would be to split your input into | , and then search for any fields consisting only of - :

 my $count = 0; foreach (split(/\|/,$string)) { if( /^-$/ ) { $count++; } } 

There really is no way to effectively implement this with a regex, since Perl does not support variable-length hangs (at least as far as I know). One way to β€œtrick” would be to add | at the beginning and end, then you could successfully use the lookbehind / lookahead statements:

 $string = "|$string|"; my $count = () = $string=~/(?<=\|)-(?=\|)/g; 

(The following is an alternative solution that uses alternative statements without changing without changing the line, so I made a mistake when I said β€œthere is no way to implement this with a regular expression.” I think splitting into | is the best way to solve this problem.)

+3
source

Source: https://habr.com/ru/post/956118/


All Articles