How to use Perl for intersecting characters between consecutive matches with regular expression replacement?

Question

How to use Perl for intersecting characters between consecutive matches with regular expression replacement?

The following lines of comma-separated values contain several consecutive empty fields:

$rawData = 
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"

I want to replace these empty fields with "N / A" values, so I decided to do this by replacing the regex.

I tried this in the first place:

$rawdata =~ s/,([,\n])/,N\/A/g; # RELABEL UNAVAILABLE DATA AS 'N/A'

who returned

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,,N/A,\n

Not what I wanted. A problem occurs when more than two consecutive commas occur. A regular expression pinches two commas at a time, so it starts with the third comma, not the second when it rescans a string.

I thought this could be due to lookahead or backback statements, so I tried the following expression:

$rawdata =~ s/(?<=,)([,\n])|,([,\n])$/,N\/A$1/g; # RELABEL UNAVAILABLE DATA AS 'N/A'

resulting in:

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,,N/A,,N/A\n

That didn't work either. He just moved the commas to one.

, , . , . ?

:

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,N/A,Clear\n
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,N/A,N/A,N/A,N/A\n

+3

regex perl substitution

Zaid 29 . '09 19:50

5

EDIT: , readline :

#!/usr/bin/perl

use strict; use warnings;
use autodie;

my $str = <<EO_DATA;
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,
EO_DATA

open my $str_h, '<', \$str;

while(my $row = <$str_h>) {
    chomp $row;
    print join(',',
        map { length $_ ? $_ : 'N/A'} split /,/, $row, -1
    ), "\n";
}

:

E:\Home> t.pl
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A

:

pos $str -= 1 while $str =~ s{,(,|\n)}{,N/A$1}g;

: s/// ,, ,N/A,, . , ,

$str =~ s{,(,|\n)}{,N/A$1}g;

pos $str .

, @ysth :

$str =~ s!,(?=[,\n])!,N/A!g;

while.

+3

Sinan Ünür 29 . '09 19:54

(?<=,)(?=,|$)

N/A.

This regular expression matches a space (space) between two commas or between a comma and the end of a line.

+2

Tim pietzcker Oct 29 '09 at 20:13

source share

Quick and dirty version of hacking:

my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";
while ($rawData =~ s/,,/,N\/A,/g) {};
print $rawData;

Not the fastest code, but the shortest. It should go through a maximum of two times.

+1

Jack M. Oct 29 '09 at 20:10

source share

Not a regular expression, but not too complicated:

$string = join ",", map{$_ eq "" ? "N/A" : $_} split (/,/, $string,-1);

At the end, you ,-1must force splitany blank fields at the end of the line.

+1

mob Oct 29 '09 at 20:16

source share

ysth · Accepted Answer · 2009-10-29T20:12:40+0000

, , , , lookbehind (?: ... ), | lookbehind.

, , , : N/A , :

s!,(?=[,\n])!,N/A!g;

:

my $rawData = "2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n";

use Data::Dumper;
$Data::Dumper::Useqq = $Data::Dumper::Terse = 1;
print Dumper($rawData);
$rawData =~ s!,(?=[,\n])!,N/A!g;
print Dumper($rawData);

:

"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,\n"
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear\n2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A\n"

How to use Perl for intersecting characters between consecutive matches with regular expression replacement?

More articles: