How to create a distance matrix containing average absolute points between each row?

Given the matrix

X1 X2 X3 X4 X5 [1,] 1 2 3 2 1 [2,] 2 3 4 4 3 [3,] 3 4 4 6 2 [4,] 4 5 5 5 4 [5,] 2 3 3 3 6 [6,] 5 6 2 8 4 

I want to create a distance matrix containing the absolute average difference between each row of each column. For example, the distance between X1 and X3 should be = 1.67 if:

abs (1 - 3) + abs (2-4) + abs (3-4) + abs (4-5) + abs (2-3) + abs (5-2) = 10/6 = 1.67.

I performed the constructor function in the vegan package as follows:

 designdist(t(test), method = "abs(AB)/6", terms = "minimum") 

The resulting distance for columns 1 and 3 is 0.666. The problem with this function is that it sums all the values ​​in each column and then subtracts them. But I need to summarize the absolute differences between each line (individually, absolutely), and then divide it by N.

+6
source share
1 answer

Here's a one line solution. It uses the dist() method argument to calculate the L1 norm, as well as the distance in the city block, as well as the Manhattan distance between each pair of columns in your data.frame.

 as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df)) 

To make it reproducible:

 df <- read.table(text=" X1 X2 X3 X4 X5 1 2 3 2 1 2 3 4 4 3 3 4 4 6 2 4 5 5 5 4 2 3 3 3 6 5 6 2 8 4", header=T) dmat <- as.matrix(dist(df, "manhattan", diag=TRUE, upper=TRUE)/nrow(df)) print(dmat, digits=3) # 1 2 3 4 5 6 # 1 0.00 1.167 1.667 2.33 1.333 3.00 # 2 1.17 0.000 0.833 1.17 0.833 2.17 # 3 1.67 0.833 0.000 1.00 1.667 1.67 # 4 2.33 1.167 1.000 0.00 1.667 1.33 # 5 1.33 0.833 1.667 1.67 0.000 2.33 # 6 3.00 2.167 1.667 1.33 2.333 0.00 
+5
source

Source: https://habr.com/ru/post/916380/


All Articles