H2O is slower than data.table R

How is it possible that data storage in the H2O matrix is ​​slower than in data.table?

#Packages used "H2O" and "data.table"
library(h2o)
library(data.table)
#create the matrix
matrix1<-data.table(matrix(rnorm(1000*1000),ncol=1000,nrow=1000))
matrix2<-h2o.createFrame(1000,1000)

h2o.init(nthreads=-1)
#Data.table variable store
for(i in 1:1000){
matrix1[i,1]<-3
}
#H2O Matrix Frame store
for(i in 1:1000){
  matrix2[i,1]<-3
}

Thank!

+1
source share
2 answers

H2O is a client / server architecture. (See http://docs.h2o.ai/h2o/latest-stable/h2o-docs/architecture.html )

So, you showed a very inefficient way to specify an H2O frame in H2O memory. Each entry will turn into a network call. You almost certainly don't want this.

In your example, since the data is small, it would be wise to make the initial assignment in the local data frame (or datatable), and then use the as.h2o () push method.

h2o_frame = as.h2o(matrix1)
head(h2o_frame)

R- R- H2O H2O. ( as.data.table(), .)


data.table :

data.table in-place: =. . , :

matrix1[i, 3 := 42]

H2O :

H2O - pull h2o.importFile(). .

as.h2o() , .

R H2O, h2o.startLogging().

+3

, h20. .

data.table - "copy-on-modify". , .

for(i in 1:1000){ 
  matrix1[i,1]<-3 
}

for(i in 1:1000){ 
  set(matrix1, i, 1L, 3) 
}

set 3 , 18 ( 6000 ).

, h2o , , . , - H2O?

+2

Source: https://habr.com/ru/post/1695719/


All Articles