Calculate the number of consecutive characters in a string using Perl

I have a string with several sequences of the following characters:

aaabbcccdddd 

I want to introduce this as: a3b2c3d4

At the moment, I have come to the following:

 #! /usr/bin/perl $str = "aaabbcccdddd"; $str =~ s/(.)\1+/$1/g; print $str."\n"; 

Conclusion:

 abcd 

It stores consecutive characters in the capture buffer and returns only one. However, I want to count the number of consecutive characters in the capture buffer, and then only display one character, followed by this count, so that it displays the output as a3b2c3d4 instead of abcd .

What modification is required for the above regular expression?

+6
source share
3 answers

It seems like a run command is needed on the substitute command, so the replacement text is considered a piece of Perl code:

  $str =~ s/((.)\2+)/$2 . length($1)/ge; 

Script

 #!/usr/bin/env perl use strict; use warnings; my $original = "aaabbcccdddd"; my $alternative = "aaabbcccddddeffghhhhhhhhhhhh"; sub proc1 { my($str) = @_; $str =~ s/(.)\1+/$1/g; print "$str\n"; } proc1 $original; proc1 $alternative; sub proc2 { my($str) = @_; $str =~ s/((.)\2+)/$2 . length($1)/ge; print "$str\n"; } proc2 $original; proc2 $alternative; 

Exit

 abcd abcdefgh a3b2c3d4 a3b2c3d4ef2gh12 

Could you break the regex to explain how it works?

I guess this is part of the match, which is a problematic and not a substitute part.

Original regex:

 (.)\1+ 

This captures one character (.) , Followed by the same character repeated one or more times.

The revised regex is the same, but also captures the entire pattern:

 ((.)\2+) 

The first open bracket starts a general capture; a second open bracket starts capturing a single character. But now this is the second capture, so \1 in the original should be \2 in the wording.

Since the search captures the entire string of duplicate characters, replacement can easily determine the length of the pattern.

+10
source

The following works if you can live with the slowdown caused by $& :

 $str =~ s/(.)\1*/$1. length $&/ge; 

Changing * to + in the above expression leaves unchanging characters unchanged.

As JRFerguson recalls, Perl 5.10+ provides the equivalent variable ${^MATCH} , which does not affect the performance of the regular expression:

 $str =~ s/(.)\g{1}+/$1. length ${^MATCH}/pge; 

For Perl 5.6+, you can avoid a performance hit:

 $str =~ s/(.)\g{1}+/ $1. ( $+[0] - $-[0] ) /ge; 
+1
source

JS:

 let data = "ababaaaabbbababb"; data.replace(/((.)\2+)/g, (match, p1, p2) => { data = data.replace(new RegExp(p1, 'g'), p2 + p1.length); }); console.log(data); 
+1
source

Source: https://habr.com/ru/post/917722/


All Articles