Assignment via `: =` in the `for` loop (R data.table)

I am trying to assign some new variables in a for loop (I am trying to create some variables with a common structure, but which depend on the subtask).

I tried for the rest of my life to redo this error to sample data, and I can’t. Here's the code that works and gets the gist of what I want to do:

 DT <- data.table( id = rep(1:100, each = 20L), period = rep(-9:10, 100L), grp = rep(sample(4L, size = 100L, replace = TRUE), each = 20L), y = runif(2000, min=0, max=5), key = c("id", "period") ) DT[ , x := cumsum(y), by = id] DT2 <- DT[id %in% seq(1, 100, by=2)] DT3 <- DT[id %in% seq(1, 100, by=3)] for (dd in list(DT, DT2, DT3)){ setkey(setkey(dd, grp)[dd[period==0, sum(x), by = grp], x_at_0_by_grp := V1], id, period) } 

This works fine, however, when I do this with my own code, it generates an Invalid.internal.selfref warning (and does not create the variable I want):

In [.data.table (setkey (dt, treatment), dt [posting_rel == 0, sum (current_balance):: Invalid .Internal.selfref is detected and fixed using a copy of the whole table to: = add a new column by reference. Tan is an earlier point, this data table was copied by R (or was created manually using a structure () or similar. Avoid the key <-, names <- and attr <- which in R currently (and weird) can copy the whole table data. Instead, use the set * syntax to avoid copying:? Set,? Setnames and? SetAttr. In addition, in R <= v3.0.2, the list (DT1, DT2) copied the entire DT 1 and DT2 (list R () used to copy named objects); upgrade to R> v3.0.2 if it bites.If this message does not help, please report datatable-help to resolve the root cause.

In fact, when I multiply my data only by the columns that are necessary in the merge, it also works fine with my data (although it does not preserve the original data sets).

This suggests that this is a key problem, but I explicitly set the keys at every step. I completely lost how to debug this from here, because I cannot repeat the error, except for my complete data set.

If I exit the operation in stages, an error occurs at the merge stage:

 for (dd in list(DT, DT2, DT3)){ dummy <- dd[period==0, sum(x), by = grp] setkey(dd, grp) dd[dummy, x_at_0_by_grp := V1] #***ERROR HERE*** setkey(dd, id, period) } 

Quick update - also causes an error if I use this with lapply instead of a for loop.

Any ideas what is going on here?


UPDATE: I came up with a workaround:

 nnames <- c("dt", "dt2", "dt3") dt_list <- list(DT, DT2, DT3) for (ii in 1:3){ dummy <- copy(dt_list[[ii]]) dummy[ , x_at_0_by_grp := sum(x[period == 0]), by=grp] assign(nnames[ii], dummy) } 

I would like to understand what is happening, and perhaps the best way to assign variables iteratively in such situations.

+3
source share
1 answer

With 20-30 criteria, keeping them off the list (with tame names like dt2 etc.) is too clumsy, so I just assume you're all on dt_list .

I suggest creating tables only with the statistics you calculate, and then rbind them:

 xxt <- rbindlist(lapply(1:length(dt_list),function(i) dt_list[[i]][,list(cond=i,xx=sum(x[period==0])),by=grp])) 

which creates

  grp cond xx 1: 1 1 623.3448 2: 2 1 784.8438 3: 4 1 699.2362 4: 3 1 367.7196 5: 1 2 323.6268 6: 4 2 307.0374 7: 2 2 447.0753 8: 3 2 185.7377 9: 1 3 275.4897 10: 4 3 243.0214 11: 2 3 149.6041 12: 3 3 166.3626 

You can easily unite if you really want these vars. For example, for dt2 :

 myi = 2 setkey(dt_list[[myi]],grp)[xxt[cond==myi,list(grp,xx)]] 

This does not fix the error you are working with, but I think this is the best approach.

+1
source

Source: https://habr.com/ru/post/985151/


All Articles