Why does R store the loop / index / layout variable in memory?

Question

Why does R store the loop / index / layout variable in memory?

I noticed that R stores the index from for loops stored in the global environment, for example:

 for (ii in 1:5){ } print(ii) # [1] 5

Is it common for people to need this index after starting a cycle?

I never use it, and I have to remember to add rm(ii) after every loop that I run (firstly, because I have anal that my namespace is clean and secondly, for memory, because I sometimes sorting through data.tables lists - in my code right now I have 357 MB of dummy variables that are losing space).

Is there an easy way around this annoyance? Perfect would be a global installation option (a la options(keep_for_index = FALSE) , something like for(ii in 1:5, keep_index = FALSE) might also be acceptable.

+6

for-loop r

MichaelChirico Apr 14 '15 at 23:28

source share

2 answers

I agree with the comments above. Even if you need to use a for loop (using only side effects, not function return values), it would be nice to structure your code in several functions and save your data in lists.

However, there is a way to “hide” the index and all temporary variables inside the loop - by calling the for function in a separate environment:

 do.call(`for`, alist(i, 1:3, { # ... print(i) # ... }), envir = new.env())

But ... if you can put your code in a function, the solution will be more elegant:

 for_each <- function(x, FUN) { for(i in x) { FUN(i) } } for_each(1:3, print)

Please note that when using a constructor like "for_each" you don’t even see the index variable.

+2

bergant Apr 15 '15 at 2:21

source share

Mrflick · Accepted Answer · 2015-04-15T02:33:21+0000

To do what you suggest, R will have to change the scope rules for for loops. This will probably never happen, because I'm sure there is code in packages that rely on it. You cannot use the index after the for loop, but assuming the loops can break() at any time, the final iteration value is not always known in advance. And repeating this as a global option will cause problems with existing code in work packages.

As already indicated, for more common applications in R, loops or laplets are used. Sort of

 for(i in 1:4) { lm(data[, 1] ~ data[, i]) }

becomes

 sapply(1:4, function(i) { lm(data[, 1] ~ data[, i]) })

You should not be afraid of functions in R. After all, R is a functional language.

It is good to use for loops for more control, but you will have to take care to remove indexing from rm() , as you pointed out. If you do not use another indexing variable in each cycle, I am surprised that they accumulate. I am also surprised that in your case, if they are data.tables, they add extra memory, since data.tables do not by default make deep copies by default. The only "price" of memory you paid is a simple pointer.

Why does R store the loop / index / layout variable in memory?

More articles: