The RecordLinkage package has a levenshteinDist function, which is one way of calculating the editing distance between lines.
install.packages("RecordLinkage") library(RecordLinkage)
Set up some data:
fruit <- c("Apple", "Apricot", "Avocado", "Banana", "Bilberry", "Blackberry", "Blackcurrant", "Blueberry", "Currant", "Cherry")
Now create a matrix of zeros to reserve memory for the distance table. Then use nested for loops to calculate individual distances. We end up with a row and column matrix for each fruit. Thus, we can rename columns and rows so that they are identical to the original vector.
fdist <- matrix(rep(0, length(fruit)^2), ncol=length(fruit)) for(i in seq_along(fruit)){ for(j in seq_along(fruit)){ fdist[i, j] <- levenshteinDist(fruit[i], fruit[j]) } } rownames(fdist) <- colnames(fdist) <- fruit
Results:
fdist Apple Apricot Avocado Banana Bilberry Blackberry Blackcurrant Apple 0 5 6 6 7 9 12 Apricot 5 0 6 7 8 10 10 Avocado 6 6 0 6 8 9 10 Banana 6 7 6 0 7 8 8 Bilberry 7 8 8 7 0 4 9 Blackberry 9 10 9 8 4 0 5 Blackcurrant 12 10 10 8 9 5 0 Blueberry 8 9 9 8 3 3 8 Currant 7 5 6 5 8 10 6 Cherry 6 7 7 6 4 6 10
source share