How to link a data table without increasing memory consumption?

I have some huge datatable dt_1, dt_2, ..., dt_Nwith the same cols. I want to tie them together into one datatable. If i use

dt <- rbind(dt_1, dt_2, ..., dt_N)

or

dt <- rbindlist(list(dt_1, dt_2, ..., dt_N))

then memory usage roughly doubles the amount needed for dt_1,dt_2,...,dt_N. Is there a way to relate them to an increase in memory consumption significantly? Please note that I do not need dt_1, dt_2, ..., dt_Nthem when combined together.

+4
source share
3 answers

Another approach using a temporary file for binding:

nobs=10000
d1 <- d2 <- d3 <-  data.table(a=rnorm(nobs),b=rnorm(nobs))
ll<-c('d1','d2','d3')
tmp<-tempfile()

# Write all, writing header only for the first one
for(i in seq_along(ll)) {
  write.table(get(ll[i]),tmp,append=(i!=1),row.names=FALSE,col.names=(i==1))
}

# 'Cleanup' the original objects from memory (should be done by the gc if needed when loading the file
rm(list=ll)

# Read the file in the new object
dt<-fread(tmp)

# Remove the file
unlink(tmp)

, , rbind, , , , .

, orignal , ​​R , (cat, awk ..).

+5

, , , .

:

#create some data
nobs=10000
d1 <- d2 <- d3 <-  data.table(a=rnorm(nobs),b=rnorm(nobs))
dt <- rbindlist(list(d1,d2,d3))

sort( sapply(ls(),function(x){object.size(get(x))}))
  nobs     d1     d2     d3     dt 
    48 161232 161232 161232 481232 

, , (, , , ) get, :

mydts <- c("d1","d2","d3") #vector of datatable names

dt<- data.table() #empty datatable to bind objects to

for(d in mydts){
  dt <- rbind(dt, get(d))
  rm(list=d)
  gc() #garbage collection
}
+3

I think <<-and getcan help you with this.

UPDATE : <<-not required.

df1 <- data.frame(x1=1:4, x2=letters[1:4], stringsAsFactors=FALSE)
df2 <- df1
df3 <- df1

dt.lst <- c("df2", "df3")

for (i in dt.lst) {
  df1 <- rbind(df1, get(i))
  rm(list=i)
}

df1
+2
source

Source: https://habr.com/ru/post/1624065/


All Articles