Why is ff still storing data in RAM?

Using the ff package from R , I imported the csv file into the ffdf object, but was surprised to find that the object took up about 700 MB of RAM. Is it supposed to store data on disk, and not in RAM? Did I do something wrong? I am new to R. Any advice is welcome. Thanks.

> training.ffdf <- read.csv.ffdf(file="c:/temp/training.csv", header=T) > # [Edit: the csv file is conceptually a large data frame consisting > # of heterogeneous types of data --- some integers and some character > # strings.] > > # The ffdf object occupies 718MB!!! > object.size(training.ffdf) 753193048 bytes Warning messages: 1: In structure(.Internal(object.size(x)), class = "object_size") : Reached total allocation of 1535Mb: see help(memory.size) 2: In structure(.Internal(object.size(x)), class = "object_size") : Reached total allocation of 1535Mb: see help(memory.size) > > # Shouldn't biglm be able to process data in small chunks?! > fit <- biglm(y ~ as.factor(x), data=training.ffdf) Error: cannot allocate vector of size 18.5 Mb 

Edit: I followed the advice of Tommy, missed the call to object.size, and looked at the Task Manager (I ran R on a Windows XP computer with 4 GB of RAM). I canceled the object, closed R, opened it and loaded the data from the file. The problem prevailed:

 > library(ff); library(biglm) > # At this point RGui.exe had used up 26176 KB of memory > ffload(file="c:/temp/trainingffimg") > # Now 701160 KB > fit <- biglm(y ~ as.factor(x), data=training.ffdf) Error: cannot allocate vector of size 18.5 Mb 

I also tried

 > options("ffmaxbytes" = 402653184) # default = 804782080 B ~ 767.5 MB 

but after loading the data, RGui still used more than 700 MB of memory, and biglm regression still threw an error.

+4
source share
3 answers

You need to provide data in biglm chunks, see "biglm". If you pass the ffdf object instead of data.frame, you will encounter one of the following two problems:

  • ffdf is not data.frame, so something undefined happens
  • the function you went through with is trying to convert ffdf to data.frame, for example. as.data.frame (ffdf), which easily runs out of your RAM, this is probably what happens to you.

Check it out? chunk.ffdf on an example of how to transfer pieces from ffdf to biglm.

+5
source

The ff package uses memory mapping to simply load portions of data into memory as needed.

But it seems that by calling object.size , you are actually forcing it to load all this into memory! This means that warning messages indicate ...

So do not do this ... Use the task manager (Windows) or the top command (Linux) to find out how much memory the R process uses before and after loading the data.

+2
source

I had the same problem, and I asked a question, and there is a possible explanation for your problem. When you read a file, character strings are considered factors, and if there are many unique levels, they will go into RAM. ff seems to always load factor levels into RAM. See this answer from jwijffels in my question:

Downloading ffdf data takes up a lot of memory

better, Miguel.

+1
source

Source: https://habr.com/ru/post/1391527/


All Articles