Big.matrix as data.frame in R

I recently started using R for data analysis. Now I have a problem in ranking a large set of query data (~ 1 GB in ASCII mode, on my laptop 4 GB of RAM in binary mode). Using bigmemory::big.matrix for this dataset is a good solution, but on condition that such a matrix "m" in the gbm() or randomForest() algorithms causes an error:

 cannot coerce class 'structure("big.matrix", package = "bigmemory")' into a data.frame 

class (m) outputs the following:

 [1] "big.matrix" attr(,"package") [1] "bigmemory" 

Is there a way to properly pass a big.matrix instance into these algorithms?

+6
source share
2 answers

I obviously cannot verify this using data of your scale, but I can reproduce your errors using the formula interface of each function:

 require(bigmemory) m <- matrix(sample(0:1,5000,replace = TRUE),1000,5) colnames(m) <- paste("V",1:5,sep = "") bm <- as.big.matrix(m,type = "integer") require(gbm) require(randomForest) #Throws error you describe rs <- randomForest(V1~.,data = bm) #Runs without error (with a warning about the response only having two values) rs <- randomForest(x = bm[,-1],y = bm[,1]) #Throws error you describe rs <- gbm(V1~.,data = bm) #Runs without error rs <- gbm.fit(x = bm[,-1],y = bm[,1]) 

Do not use the formula interface for randomForest is a fairly common tip for large data sets; it can be very ineffective. If you read ?gbm , you will see a similar recommendation directing you to gbm.fit for big data.

+11
source

It often happens that the memory occupied by numerical objects is larger than the disk space. Each "double" element in a vector or matrix takes 8 bytes. When you force an object to a data.frame file, it may need to be copied to RAM. You should avoid using functions and data structures that are outside of those supported by the bigmemory / big *** package. "biglm" is available, but I doubt that you can expect gbm () or randomForest () to recognize and use objects in the "big" family.

+2
source

Source: https://habr.com/ru/post/902676/


All Articles