You might be able to mix and match regex and code -
$line =~ /(?{($cnt,@ary)=(0,)})^(?:([^,]+)(?{push @ary,$cnt; push @ary,$^N})|,(?{$cnt++}))+/x
and print join( ',', @ary);
deployed -
$line =~ / (?{($cnt,@ary)=(0,)}) ^(?: ([^,]+) (?{push @ary,$cnt; push @ary,$^N}) | , (?{$cnt++}) )+ /x and print join( ',', @ary);
some landmarks
With a little tweak to flesk and sln (look for fleskNew and slnNew)
the winner is fleskNew when the substitution operator is removed.
code -
use Benchmark qw( cmpthese ) ; $samp = "x,,10.3,,q,,5.2,3.1,,,ghy,g,,l,p"; $line = $samp; cmpthese( -5, { flesk1 => sub{ $index = 0; join ",", map {join ",", @$_} grep $_->[1], map {[$index++, $_]} split ",", $line; }, flesk2 => sub{ ($i, @vars) = (0,); while ($line =~ s/^(,*)([^,]+)//) { push @vars, $i += length($1), $2; } $line = $samp; }, fleskNew => sub{ ($i, @vars) = (0,); while ($line =~ /(,*)([^,]+)/g) { push @vars, $i += length($1), $2; } }, sln1 => sub{ $line =~ / (?{($cnt,@ary)=(0,)}) ^(?: ([^,]+) (?{push @ary,$cnt; push @ary,$^N}) | , (?{$cnt++}) )+ /x }, slnNew => sub{ $line =~ / (?{($cnt,@ary)=(0,)}) (?: (,*) (?{$cnt += length($^N)}) ([^,]+) (?{push @ary, $cnt,$^N}) )+ /x }, } );
numbers -
Rate flesk1 sln1 flesk2 slnNew fleskNew flesk1 20325/s -- -51% -52% -56% -60% sln1 41312/s 103% -- -1% -10% -19% flesk2 41916/s 106% 1% -- -9% -17% slnNew 45978/s 126% 11% 10% -- -9% fleskNew 50792/s 150% 23% 21% 10% --
some tests 2
Adds Birei built-in replication and cropping (all-in-one) solution.
Deviations:
Flesk1 has been modified to remove the final βjoinβ because it is not included in the list of other regular expression solutions. This provides a better bench.
Birei deviates on the bench as he modifies the original row as a final decision.
This aspect cannot be removed. The difference between Birei1 and BireiNew is that the new one removes the final ','.
Flesk2, Birei1 and BireiNew have additional overhead for restoring the original line
due to the lookup operator.
The winner still looks like FleskNew ..
code -
use Benchmark qw( cmpthese ) ; $samp = "x,,10.3,,q,,5.2,3.1,,,ghy,g,,l,p"; $line = $samp; cmpthese( -5, { flesk1a => sub{ $index = 0; map {join ",", @$_} grep $_->[1], map {[$index++, $_]} split ",", $line; }, flesk2 => sub{ ($i, @vars) = (0,); while ($line =~ s/^(,*)([^,]+)//) { push @vars, $i += length($1), $2; } $line = $samp; }, fleskNew => sub{ ($i, @vars) = (0,); while ($line =~ /(,*)([^,]+)/g) { push @vars, $i += length($1), $2; } }, sln1 => sub{ $line =~ / (?{($cnt,@ary)=(0,)}) ^(?: ([^,]+) (?{push @ary,$cnt; push @ary,$^N}) | , (?{$cnt++}) )+ /x }, slnNew => sub{ $line =~ / (?{($cnt,@ary)=(0,)}) (?: (,*) (?{$cnt += length($^N)}) ([^,]+) (?{push @ary, $cnt,$^N}) )+ /x }, Birei1 => sub{ $i = -1; $line =~ s/ (?(?=,+) ( (?: , (?{ ++$i }) )+ ) | (?<no_comma> [^,]+ ,? ) (?{ ++$i }) ) / defined $+{no_comma} ? $i . qq[,] . $+{no_comma} : qq[] /xge; $line = $samp; }, BireiNew => sub{ $i = 0; $line =~ s/ (?: , (?{++$i}) )* (?<data> [^,]* ) (?: ,*$ )? (?= (?<trailing_comma> ,?) ) / length $+{data} ? "$i,$+{data}$+{trailing_comma}" : "" /xeg; $line = $samp; }, } );
Results -
Rate BireiNew Birei1 flesk1a flesk2 sln1 slnNew fleskNew BireiNew 6030/s -- -18% -74% -85% -86% -87% -88% Birei1 7389/s 23% -- -68% -82% -82% -84% -85% flesk1a 22931/s 280% 210% -- -44% -45% -51% -54% flesk2 40933/s 579% 454% 79% -- -2% -13% -17% sln1 41752/s 592% 465% 82% 2% -- -11% -16% slnNew 47088/s 681% 537% 105% 15% 13% -- -5% fleskNew 49563/s 722% 571% 116% 21% 19% 5% --