SED or AWK script to replace multiple texts

I am trying to do the following with a sed script, but it takes too much time. It seems like I'm doing wrong.

Scenario: I have student records (> 1 million) in students.txt. In this file (each line), the 1st 10 characters are the student ID, and the next 10 characters are the contact number, etc.

students.txt

1000000001 9234567890 XXX ...
1000000002 9325788532 YYY ...
.
.
.
1001000000 8766443367 ZZZZ ...

I have another file (encrypted_contact_numbers.txt) that has all the phone numbers, and the numbers and corresponding encrypted phone numbers are below

encrypted_contact_numbers.txt

Phone_Number, Encrypted_Phone_Number

9234567890, 1122334455
9325788532, 4466742178
.
,
.
8766443367, 2964267747

I wanted to replace all contact numbers (11th position 20) students.txtwith the corresponding encrypted phone number from encrypted_contact_numbers.txt.

Expected Result:

1000000001 1122334455 XXX ...
1000000002 4466742178 YYY ...
,
,
,
1001000000 2964267747 ZZZZ ...

I am using the sed script below to perform this operation. It works fine, but too slow.

Approach 1:

while read -r pattern replacement; do   
    sed -i "s/$pattern/$replacement/" students.txt
done < encrypted_contact_numbers.txt

Approach 2:

sed 's| *\([^ ]*\) *\([^ ]*\).*|s/\1/\2/g|' <encrypted_contact_numbers.txt |
sed -f- students.txt > outfile.txt

Is there a way to quickly process this huge file?

: 9 2018

, AWK Perl, , ( 10-20). , . ?

students.txt:

1000000001 9234567890 XXX... 9234567890
1000000002 9325788532 YYY...
.
.
.
1001000000 8766443367 ZZZZ 9234567890...

+4
3

awk .

awk '
FNR==NR{
  sub(/ +$/,"");
  a[$1]=$2;
  next
}
(substr($0,11,10) in a){
  print substr($0,1,10) a[substr($0,11,10)] substr($0,21)
}
' FS=", " encrypted_contact_number.txt students.txt

. .

10000000011122334455XXX...
10000000024466742178YYY...
+2

awk !

phone_map

awk -F', *' 'NR==FNR{a[$1]=$2; next}
                    {key=substr($0,11,20)}
           key in a {$0=substr($0,1,10) a[key] substr($0,21)}1' phone_map data_file

, . , .

+5

Perl?:) Perl Monks .

@Borodin. , , .

#!/usr/bin/env perl

use strict;     # keep out of trouble
use warnings;   # ditto

my %numbers;    # map from real phone number to encrypted phone number

open(my $enc, '<', 'encrypted_contact_numbers.txt') or die("Can't open map file");
while(<$enc>) {
    s{\s+}{}g;                               #remove all whitespace
    my ($regular, $encrypted) = split ',';
    $numbers{$regular} = $encrypted;
}

# Make a regex that will match any of the numbers of interest
my $number_pattern = join '|', map quotemeta, keys %numbers;
$number_pattern = qr{$number_pattern}o;
    # Compile the regex - we no longer need the string representation

while(<>) {     # process each line of the input
    next unless length > 1;     # Skip empty lines (don't need this line if there aren't any in your input file)
    substr($_, 10, 10) =~ s{($number_pattern)}{$numbers{$1}}e;
    # substr: replace only in columns 11--20
    # Replacement (s{}{}e): the 'e' means the replacement text is perl code.
    print;  # output the modified line
}

Test

Perl v5.22.4.

encrypted_contact_numbers.txt

9234567890, 1122334455
9325788532, 4466742178

students.txt

aaaaaaaaaa9234567890XXX...
bbbbbbbbbb9325788532YYY...
cccccccccc8766443367ZZZZ...
dddddddddd5432112345Nonexistent phone number

( )

./process.pl students.txt:

aaaaaaaaaa1122334455XXX...
bbbbbbbbbb4466742178YYY...
cccccccccc8766443367ZZZZ...
dddddddddd5432112345Nonexistent phone number

, , .

+2

Source: https://habr.com/ru/post/1692595/


All Articles