Change distance: ignore start / end

I am looking for an algorithm that edits a distance, but which will ignore start + end in one line and space:

edit("four","foor") = 1 edit("four","noise fo or blur") = 1 

Is there an existing algorithm for this? Maybe even Perl or Python Library?

-4
source share
2 answers

The code for this is simple in concept. This is your idea of ​​what you would like to ignore, which you can add yourself:

 #!perl use v5.22; use feature qw(signatures); no warnings qw(experimental::signatures); use Text::Levenshtein qw(distance); say edit( "four", "foor" ); say edit( "four", "noise fo or blur" ); sub edit ( $start, $target ) { # transform strings to ignore what you want # ... distance( $start, $target ) } 

Perhaps you want to check all substrings of the same length:

 use v5.22; use feature qw(signatures); no warnings qw(experimental::signatures); use Text::Levenshtein qw(distance); say edit( "four", "foar" ); say edit( "four", "noise fo or blur" ); sub edit ( $start, $target ) { my $start_length = length $start; $target =~ s/\s+//g; my @all_n_chars = map { substr $target, $_, 4 } 0 .. ( length($target) - $start_length ); my $closest; my $closest_distance = $start_length + 1; foreach ( @all_n_chars ) { my $distance = distance( $start, $_ ); if( $distance < $closest_distance ) { $closest = $_; $closest_distance = $distance; say "closest: $closest Distance: $distance"; last if $distance == 0; } } return $closest_distance; } 

This very simple idea finds what you want. However, understand that other random lines may accidentally have an editing distance that is lower.

 closest: foar Distance: 1 1 closest: nois Distance: 3 closest: foor Distance: 1 1 

You can expand this to remember the true starting positions of each line so that you can find it again in the original, but that should be enough to send you on your way. If you want to use Python, I think the program may look very similar.

+4
source

Here is the Perl 6 solution. I use a grammar that knows how to capture four interesting characters, despite interstitial things. More complex requirements require a different grammar, but it is not so difficult.

At each match, an object of the NString :: Actions class gets a change to check for compliance. He does the same as before. This seems like a lot of work, and for this trivial example. For more complex examples, this will not be much worse. My version of Perl 5 would have to do many tools to understand what to save or not to save.

 use Text::Levenshtein; my $string = 'The quixotic purple and jasmine butterfly flew over the quick zany dog'; grammar NString { regex n-chars { [<.ignore-chars>* \w]**4 } regex ignore-chars { \s } } class NString::Actions { # See my subset IntInf where Int:D | Inf; has $.target; has Str $.closest is rw = ''; has IntInf $.closest-distance is rw = Inf; method n-chars ($/) { my $string = $/.subst: /\s+/, '', :g; my $distance = distance( $string, self.target ); # say "Matched <$/>. Distance for $string is $distance"; if $distance < self.closest-distance { self.closest = $string; self.closest-distance = $distance; } } } my $action = NString::Actions.new: target => 'Perl'; loop { state $from = 0; my $match = NString.subparse( $string, :rule('n-chars'), :actions($action), :c($from) ); last unless ?$match; $from++; } say "Shortest is { $action.closest } with { $action.closest-distance }"; 

(I made a direct port from Perl 5, which I will leave here)

I tried the same thing in Perl 6, but I'm sure this is a bit verbose. I was wondering if there is a smart way to capture groups of N characters for comparison. Maybe I will have some improvement later.

 use Text::Levenshtein; put edit( "four", "foar" ); put edit( "four", "noise fo or blur" ); sub edit ( Str:D $start, Str:D $target --> Int:D ) { my $target-modified = $target.subst: rx/\s+/, '', :g; my $last-position-to-check = [-] map { .chars }, $target-modified, $start; my $closest = Any; my $closest-distance = $start.chars + 1; for 0..$last-position-to-check -> $starting-pos { my $substr = $target-modified.substr: $starting-pos, $start.chars; my $this-distance = distance( $start, $substr ); put "So far: $substr -> $this-distance"; if $this-distance < $closest-distance { $closest = $substr; $closest-distance = $this-distance; } last if $this-distance = 0; } return $closest-distance // -1; } 
+4
source

Source: https://habr.com/ru/post/1270126/


All Articles