I had a strange problem: I am saving ffdf data using
save.ffdf()
from the ffbase package and when I load them into a new R session, doing
load.ffdf("data.f")
it is loaded into RAM about 90% of the memory than the same data as the data.frame object in R. Having this problem, it makes no sense to use ffdf , right? I cannot use ffsave because I work on a server and I do not have a zip application.
packageVersion(ff) # 2.2.10 packageVersion(ffbase) # 0.6.3
Any ideas on?
[edit] sample code to help clarify:
data <- read.csv.ffdf(file = fn, header = T, colClasses = classes)
closing session R ... opening again:
load.ffdf(file.name)
then I have a ffdf data object [5], and its memory size is almost as large as:
data.R <- data[,] # which is a data.frame.
[end of editing]
* [SECOND EDITING :: FULL PLAYBACK CODE:]
Since my question has not yet been answered, and I still find the problem, I gave a reproducible example:
dir1 <- 'P:/Projects/RLargeData'; setwd(dir1); library(ff) library(ffbase) memory.limit(size=4000) N = 1e7; df <- data.frame( x = c(1:N), y = sample(letters, N, replace =T), z = sample( as.Date(sample(c(1:2000), N, replace=T), origin="1970-01-01")), w = factor( sample(c(1:N/10) , N, replace=T)) ) df[1:10,] dff <- as.ffdf(df) head(dff)
So, we have 384 MB of memory, and after gc () - 287, which is around the size of the data on the disk. (also checked in the "Process explorer" application for Windows)
> sessionInfo() R version 2.15.2 (2012-10-26) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C LC_TIME=Danish_Denmark.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods base other attached packages: [1] ffbase_0.7-1 ff_2.2-10 bit_1.1-9
[END SECOND EDIT]