How can I match string order between two documents in Perl?

I have a problem creating a PERL program for matching words in two documents. Say there are documents A and B.

So, I want to delete words in document A that are not listed in document B.

Example 1 :

A: I eat pizza

B: She goes to the market and ate pizza

result: eat pizza

Example 2 : A: eat pizza

B: eat pizza

Result: pizza (the word order matters, so the "eat" is removed.)

I use Perl for the system, and the sentences in each document are not in large numbers, so I think I will not use SQL

And the program is a sub-rule for automatic essay assessment for the Indonesian language (Bahasa)

Thanx, Sorry if my question is a bit confusing. I am really new to this world :)

+3
1

, , 100% , :

1: ( )

#!/usr/bin/perl -w

use strict;
use File::Slurp;

my @B_lines = File::Slurp::read_file("B") || die "Error reading B: $!";
my %B_words = ();
foreach my $line (@B_lines) {
    map { $B_words{$_} = 1 } split(/\s+/, $line);
}
my @A_lines = File::Slurp::read_file("A") || die "Error reading A: $!";
my @new_lines = ();
foreach my $line (@A_lines) {
    my @B_words_only = grep { $B_words{$_} } split(/\s+/, $line);
    push @new_lines, join(" ", @B_words_only) . "\n";
}
File::Slurp::write_file("A_new", @new_lines) || die "Error writing A_new: $!";

"A_new", A, B.

- A ,

    word1        word2              word3

word1 word2 word3

, , , , 100%

2: ( , A out - )

#!/usr/bin/perl -w

use strict;
use File::Slurp;

my @A_words = split(/\s+/gs, File::Slurp::read_file("A") || die "Error reading A:$!");
my @B_words = split(/\s+/gs, File::Slurp::read_file("B") || die "Error reading B:$!");
my $B_counter = 0;
for (my $A_counter = 0; $A_counter < scalar(@A_words); ++$A_counter) {
    while ($B_counter < scalar(@B_words)
        && $B_words[$B_counter] ne $A_words[$A_counter]) {++$B_counter;}
    last if $B_counter == scalar(@B_words);
    print "$A_words[$A_counter]";
}

3 ( Perl?:))

Perl ( system() backticks Perl script)

comm -12 A B | tr "\012" " " 

Perl:

my $new_text = `comm -12 A B | tr "\012" " " `;

. , " Perl"... , , .

+1

Source: https://habr.com/ru/post/1746805/


All Articles