How to use Perl for intersecting characters between consecutive matches with regular expression replacement?

The following lines of comma-separated values ​​contain several consecutive empty fields:

$rawData = 
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"

I want to replace these empty fields with "N / A" values, so I decided to do this by replacing the regex.

I tried this in the first place:

$rawdata =~ s/,([,\n])/,N\/A/g; # RELABEL UNAVAILABLE DATA AS 'N/A'

who returned

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,,N/A,\n

Not what I wanted. A problem occurs when more than two consecutive commas occur. A regular expression pinches two commas at a time, so it starts with the third comma, not the second when it rescans a string.

I thought this could be due to lookahead or backback statements, so I tried the following expression:

$rawdata =~ s/(?<=,)([,\n])|,([,\n])$/,N\/A$1/g; # RELABEL UNAVAILABLE DATA AS 'N/A'

resulting in:

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,,N/A,,N/A\n

That didn't work either. He just moved the commas to one.

, , . , . ?

:

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,N/A,N/A,N/A,N/A\n
+3
5

, , , , lookbehind (?: ... ), | lookbehind.

, , , : N/A , :

s!,(?=[,\n])!,N/A!g;

:

my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";

use Data::Dumper;
$Data::Dumper::Useqq = $Data::Dumper::Terse = 1;
print Dumper($rawData);
$rawData =~ s!,(?=[,\n])!,N/A!g;
print Dumper($rawData);

:

"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A\n"
+2

EDIT: , readline :

#!/usr/bin/perl

use strict; use warnings;
use autodie;

my $str = <<EO_DATA;
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,
EO_DATA

open my $str_h, '<', \$str;

while(my $row = <$str_h>) {
    chomp $row;
    print join(',',
        map { length $_ ? $_ : 'N/A'} split /,/, $row, -1
    ), "\n";
}

:

E:\Home> t.pl
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A

:

pos $str -= 1 while $str =~ s{,(,|\n)}{,N/A$1}g;

: s/// ,, ,N/A,, . , ,

$str =~ s{,(,|\n)}{,N/A$1}g;

pos $str .

, @ysth :

$str =~ s!,(?=[,\n])!,N/A!g;

while.

+3

(?<=,)(?=,|$)

N/A.

This regular expression matches a space (space) between two commas or between a comma and the end of a line.

+2
source

Quick and dirty version of hacking:

my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";
while ($rawData =~ s/,,/,N\/A,/g) {};
print $rawData;

Not the fastest code, but the shortest. It should go through a maximum of two times.

+1
source

Not a regular expression, but not too complicated:

$string = join ",", map{$_ eq "" ? "N/A" : $_} split (/,/, $string,-1);

At the end, you ,-1must force splitany blank fields at the end of the line.

+1
source

Source: https://habr.com/ru/post/1721433/


All Articles