How to quickly get data in h2o

What is my question:

Equipment / Space:

  • 32 Xeon threads with ~ 256 GB RAM
  • ~ 65 GB of data to download. (about 5.6 billion cells).

Problem:
It takes several hours to upload my data to h2o. This is not a special treatment, but only "as.h2o (...)".

It takes less than a minute using "fread" to get the text in space, and then I do some row / column conversions (diff, lags) and try to import.

R's total memory is ~ 56 GB before trying in any way to "as.h2o", so allocating 128 shouldn't be too crazy, right?

:
, , h2o? , .

:

  • bump ram 128 'h2o.init'
  • slam, data.table (...
  • "as.data.frame" "as.h2o"
  • csv (r write.csv . , , ).
  • sqlite3, , .
  • /, , . , Java . ( )

Update:
, - , "h2o.importFile(...)". 15 .

Update2:
CSV , ~ 22 (~ 2.4Mrows, ~ 2300 cols). , 12:53 2:44 , csv. , .

+4
1

as.h2o() , :

  • R data.frame, .
  • data.frame ( data.table::fwrite(), (*), write.csv())
  • h2o.uploadFile()

, . h2o.uploadFile() h2o.importFile(). , , - :

  • h2o.uploadFile() .
  • h2o.importFile() .

, , , , h2o.importFile(). ( .)

: R, . , R H2O , cbind . 100 2300 R, csv 2200 CSV . h2o.cbind() H2O.

*: h2o:::as.h2o.data.frame ( ), . data.table options(h2o.use.data.table = TRUE); h2o.fwrite.

+3

Source: https://habr.com/ru/post/1695713/


All Articles