Large Merge / Memory Management

I hit a wall trying to combine a large file and a smaller one. I have read many other messages about memory management in R, and could not find a non-extreme (switch to 64-bit, load into a cluster, etc.) way to solve it. I tried a little with the bigmemory package but could not find a solution. I thought I would try here before I get upset.

The code that I run looks like this:

#rm(list=ls()) localtempdir<- "F:/Temp/" memory.limit(size=4095) [1] 4095 memory.size(max=TRUE) [1] 487.56 gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 170485 4.6 350000 9.4 350000 9.4 Vcells 102975 0.8 52633376 401.6 62529185 477.1 client_daily<-read.csv(paste(localtempdir,"client_daily.csv",sep=""),header=TRUE) object.size(client_daily) >130MB sbp_demos<-read.csv(paste(localtempdir,"sbp_demos",sep="")) object.size(demos) >0.16MB client_daily<-merge(client_daily,sbp_demos,by.x="OBID",by.y="OBID",all.x=TRUE) Error: cannot allocate vector of size 5.0 MB 

I think I'm asking if there are any smart ways around this that are not related to buying new equipment?

  • I need to be able to merge to create a larger object.
  • Then I will need to do regressions, etc. with this larger object.

Should I give up? Should a big problem help solve this problem?

Any guidance is greatly appreciated.

Details: R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32 / i386 (32-bit) Intel 2 Duo Core @ 2.33 GHz, 3.48 GB RAM

+6
source share
1 answer

As mentioned in Chase, you can try data.table or maybe sqldf .

For one of them, you are likely to get more juice if you set the indexes correctly.

Using data.table you would:

 dt1 <- data.table(sbp_demos, key='OBID') dt2 <- data.table(client_daily, key='OBID') ## Do an INNER JOIN-like operation, where non-matching rows are removed mi <- dt1[dt2, nomatch=0] ## Do a RIGHT JOIN(?)-like operation ... all rows in dt2 will be returned. ## If there is no matching row in dt1, the values in the dt1 columns for ## the merged row will be NA mr <- dt1[dt2] 

If you go the sqldf route, see example 4i on your website ... make sure you use indexes correctly.

+8
source

Source: https://habr.com/ru/post/904268/


All Articles