What is the best way to store distances with H2O?

Suppose I have 2 data.frames, and I want to calculate the Euclidean distance between all their rows. My code is:

set.seed(121)
# Load library
library(h2o)
system.time({
  h2o.init()
  # Create the df and convert to h2o frame format
  df1 <- as.h2o(matrix(rnorm(7500 * 40), ncol = 40))
  df2 <- as.h2o(matrix(rnorm(1250 * 40), ncol = 40))
  # Create a matrix in which I will record the distances
  matrix1 <- as.h2o(matrix(0, nrow = 7500, ncol = 40))
  # Loop to calculate all the distances
  for (i in 1:nrow(df2)){
    matrix1[, i] <- h2o.sqrt(h2o.distance(df1, df2[, i]))
  }
})

I am sure there is a more efficient way to store it in a matrix.

+1
source share
1 answer

You do not need to calculate the distance inside the loop, the H2O distance function can efficiently calculate the distances for all rows. For two data frames with dimensions n x kand m x kyou can find the distance matrix n x mas follows:

distance_matrix <- h2o.distance(df1, df2, 'l2')

, h2o.distance() , : "l1" - ( L1), "l2" - ( L2), "cosine" - "cosine_sq" - .

, :

library(h2o)
h2o.init()
df1 <- as.h2o(matrix(rnorm(7500 * 40), ncol = 40))
df2 <- as.h2o(matrix(rnorm(1250 * 40), ncol = 40))
distance_matrix <- h2o.distance(df1, df2, 'l2')

7500 rows x 1250 columns.

+2

Source: https://habr.com/ru/post/1695722/


All Articles