Efficient way to maintain h2o data frame

Suppose I have a function 'getData ()' that returns data (see it as a data stream). Now I need to form a h2o data frame with this data. I need to insert them into a new row only if it is not in the data frame.

One obvious way:

  • There is a global h2o data frame
  • Create a h2o data frame (from 1 row) from the received data. (I am using as.h2o ())
  • Check if it is present in the global data frame (using h2o.which () or any other function)
  • If it is missing, add it to the data frame (using h2o.rbind ())

The above solution is too slow. Creating a h2o data frame every time data arrives (second step) takes too much time. (Tested only on a small data set)

I also thought about saving them in an R data frame, and then through html using h2o.rbind ().

What is the best way (time is priority)?

0
source share
1 answer

as.h2o(), R- H2O . . as.h2o() data.table . data.table, data.table::fwrite() utils::write.csv() as.h2o().

library(data.table)
options("h2o.use.data.table" = TRUE)

as.h2o(), , R data.frame, data.frame H2OFrame as.h2o() ( data.table), H2OFrame, , , "" H2OFrame h2o.rbind().

, - .

+1

Source: https://habr.com/ru/post/1695715/


All Articles