How to remove redundant fields and merge result rows

I am trying to process a simple text file. This is basically an index of names and related number fields, formatted as follows:

Novoselsky, Matthew, 484, 584, 777
Novoselsky, Matthew, 1151
Nunes, Paulino, 116
Nussbaum, Mike, 1221, 444,
Nussbaum, Mike, 156

What I would like to handle in this

Nowosielski, Matthew, 484, 584, 777, 1151
Nunes, Paulino, 116
Nussbaum, Mike, 156, 444, 1221

As you can see, lines do not end sequentially: some of them are most likely spaces, some newlines, and some are commas. In fact, I need to concatenate lines starting with duplicated full names, discarding the redundant name record when merging and preserving the numerical order of the number fields.

My gut tells me to learn some kind of quick perl or awk, but my skill set for both is empty. I looked into both, and after some searching and reading could not find a clear or clean way to a solution.

So my question is: what would be the best work tool that I could learn effectively and enough to accomplish this task? Also, given the proposed tool, are there any suggestions on how to approach the problem?

, , , , . , , , , , .

?

+3
4

, -. , , ", , " , .

/ .

- ,

#!/usr/bin/env perl
use strict;
use warnings;

my %merged;

while (my $record = <DATA>) {
    chomp $record;
    my ($lname, $fname, @stuff) = split /[, ]+/, $record;
    push @{ $merged{"$lname, $fname"} }, grep { m/^\d+$/; } @stuff;
}

foreach my $name (sort keys %merged) {
    print $name, ", ", join( ', ', sort { $a <=> $b } @{$merged{$name}}), "\n";
}

__DATA__
Nowosielski, Matthew, 484, 584, 777
Nowosielski, Matthew, 1151
Nunes, Paulino, 116
Nussbaum, Mike, 1221, 444,
Nussbaum, Mike, 156
Nowosielski, Matthew, Kimball, 485, 684, 277

Nowosielski, Matthew, 277, 484, 485, 584, 684, 777, 1151
Nunes, Paulino, 116
Nussbaum, Mike, 156, 444, 1221

#!/usr/bin/env perl
use strict;
use warnings;

my %merged;

while (my $record = <DATA>) {
    chomp $record;
    my ($lname, $fname, @stuff) = split /,/, $record;

    push @{ $merged{"$lname, $fname"} }, @stuff;
}

while (my ($name, $stuff) = each %merged) {
    print $name, join( ',', @$stuff), "\n"; 
}

__DATA__
Nowosielski, Matthew, 484, 584, 777
Nowosielski, Matthew, 1151
Nunes, Paulino, 116
Nussbaum, Mike, 1221, 444,
Nussbaum, Mike, 156
+4

, python script.

() . . . , . , .

+2

(Perl - , Python - , Awk - ). sed ( C).

awk:

awk '{ for (i = 3; i <= NF; i++) {names[$1, $2] = names[$1, $2] " " $i } }
     END { for (name in names) { printf "%s: %s\n", name, names[name]; } }'

< -F,.

- - awk, perl; , perl, awk. ( , GNU Awk asort asorti , , "names[$1,$2] awk.) Perl, Python, Python, , , Perl.

+1

AWK

#!/usr/bin/awk -f
$1 == lastOne && $2 == lastTwo { $1=""; $2=""; printf ", %s", $0 ;lastOne=$1; lastTwo=$2 }
$1 != lastOne && $2 != lastTwo { printf "\n%s", $0 ;lastOne=$1; lastTwo=$2 }
END {printf "\n" }

script , ...

0

Source: https://habr.com/ru/post/1772581/


All Articles