Explain the result of the "adist" function in R

I used the adist function in R, it calculates the Levenshtein distance between two lines of characters. Here is an example of reproducibility:

>a <- c("bonjour", "bonsoir", "good morning", "hello world")
>b <- c("maman", "bienjoue", "printemps")

>adist(a, b, counts = TRUE)

The results I get are as follows:

     [,1] [,2] [,3]
[1,]    7    3    8
[2,]    7    5    8
[3,]   10   11   12
[4,]   11   10   11

attr(,"counts")

, , ins

     [,1] [,2] [,3]
[1,]    0    1    2
[2,]    0    1    2
[3,]    0    0    1
[4,]    0    1    0

, , del

     [,1] [,2] [,3]
[1,]    2    0    0
[2,]    2    0    0
[3,]    7    4    4
[4,]    6    4    2

, , sub

     [,1] [,2] [,3]
[1,]    5    2    6
[2,]    5    4    6
[3,]    3    7    7
[4,]    5    5    9

attr(,"trafos")

     [,1]            [,2]            [,3]           
[1,] "SSSSSDD"       "MSIMMMMS"      "SSIMSSSSI"    
[2,] "SSSSSDDS"      "MSIMSMSS"      "SSIMSSSSI"    
[3,] "SSDDDMSDDDMDD" "SSSSSDMSSDDDD" "SSSSSIMSSDDDD"
[4,] "SSSSSDDDDDDD"  "SIMSSMSSDDDD"  "SSSSSSSSSDDD"

In cell [4, 1] you can see that he performed 6 deletions and 5 replacements and 0 inserts, however, if you look at the “trafos” attribute for this cell, it displays 5 times S and 7 times D total of 12 changes, when the distance is 11 (adds an extra D).

This is when we calculate the Levenshtein distance between "hello world" and "maman".

If I apply adist directly to these two, and not to two vectors, I get the following:

>adist("hello world","maman",counts = TRUE)

     [,1]
[1,]   11

attr(,"counts")

, , ins

     [,1]
[1,]    0

, , del

     [,1]
[1,]    6

, , sub

     [,1]
[1,]    5

attr(,"trafos")

     [,1]         
[1,] "SSSSSDDDDDD" 

What seems right in this case.

"adist" ( )?

+4

Source: https://habr.com/ru/post/1675417/


All Articles