Collecting an unknown number of results in a loop

What is the idiomatic way of collecting results in a cycle in R if the number of final results is not known in advance? Here is an example of a toy:

results = vector('integer') i=1L while (i < bigBigBIGNumber) { if (someCondition(i)) results = c(results, i) i = i+1 } results 

The problem with this example is that (I assume) it will have quadratic complexity, since the vector needs to be redistributed each time it is added. (Is this right?) I'm looking for a solution that avoids this.

I found Filter , but this requires preliminary generation 1:bigBigBIGNumber , which I want to avoid in order to save memory. (Question: Does for (i in 1:N) also pre-generate 1:N and store it in memory?)

I could do something like a linked list like this:

 results = list() i=1L while (i < bigBigBIGNumber) { if (someCondition(i)) results = list(results, i) i = i+1 } unlist(results) 

(Note that this is not concatenation. It builds a structure like list(list(list(1),2),3) , then it is smoothed using unlist .)

Is there a better way than this? What is the idiomatic method that is commonly used? (I am very new to R.) I am looking for a suggestion on how to solve this problem. Suggestions for both compact (easy to write) and high-speed code are welcome! (But I would like to focus on fast and efficient memory.)

+4
source share
4 answers

Here is an algorithm that doubles the size of the output list as it populates, achieving several linear calculations, as test tests show:

 test <- function(bigBigBIGNumber = 1000) { n <- 10L results <- vector("list", n) m <- 0L i <- 1L while (i < bigBigBIGNumber) { if (runif(1) > 0.5) { m <- m + 1L results[[m]] <- i if (m == n) { results <- c(results, vector("list", n)) n <- n * 2L } } i = i + 1L } unlist(results) } system.time(test(1000)) # user system elapsed # 0.008 0.000 0.008 system.time(test(10000)) # user system elapsed # 0.090 0.002 0.093 system.time(test(100000)) # user system elapsed # 0.885 0.051 0.936 system.time(test(1000000)) # user system elapsed # 9.428 0.339 9.776 
+3
source

If you cannot calculate 1:bigBigNumber , count the records, create a vector, then fill it.

 num <- 0L i <- 0L while (i < bigBigNumber) { if (someCondition(i)) num <- num + 1L i <- i + 1L } result <- integer(num) num <- 0L while (i < bigBigNumber) { if (someCondition(i)) { result[num] <- i num <- num + 1L } i <- i + 1L } 

(This code has not been verified.)

If you can calculate 1:bigBigBIGNumber , this will also work:

I assume that you want to call a function, and not just stick to the indices themselves. Something like this might be closer to what you want:

 values <- seq(bigBigBIGNumber) sapply(values[someCondition(values)], my_function) 
+2
source

closer to the second that you indicated:

  results <- list() for (i in ...) { ... results[[i]] <- ... } 

Note that i does not have to be integer , it can be character , etc.

Alternatively, you can use results[[length(results)]] <- ... if necessary, but if you already have an iterator, you probably won't.

+1
source

Presumably there is a maximum size that you are willing to endure; pre-distribute and fill this level, and then trim if necessary. This avoids the risk of not satisfying the requirement of doubling the size, even if a small amount of memory is required; it does not fire earlier and includes only one, not the redistribution of log (n). Here is the function that takes the maximum size, the generation function, and the token returned by the generation function when there is nothing that could be generated. We return to n results before returning.

 filln <- function(n, FUN, ..., RESULT_TYPE="numeric", DONE_TOKEN=NA_real_) { results <- vector(RESULT_TYPE, n) i <- 0L while (i < n) { ans <- FUN(..., DONE_TOKEN=DONE_TOKEN) if (identical(ans, DONE_TOKEN)) break i <- i + 1L results[[i]] <- ans } if (i == n) warning("intolerably large result") else length(results) <- i results } 

Here is the generator

 fun <- function(thresh, DONE_TOKEN) { x <- rnorm(1) if (x > thresh) DONE_TOKEN else x } 

and in action

 > set.seed(123L); length(filln(10000, fun, 3)) [1] 163 > set.seed(123L); length(filln(10000, fun, 4)) [1] 10000 Warning message: In filln(10000, fun, 4) : intolerably large result > set.seed(123L); length(filln(100000, fun, 4)) [1] 23101 

We can compare the overhead approximately by comparing with what we know beforehand how much space is required

 f1 <- function(n, FUN, ...) { i <- 0L result <- numeric(n) while (i < n) { i <- i + 1L result[i] <- FUN(...) } result } 

Here we check the time and value of one result.

 > set.seed(123L); system.time(res0 <- filln(100000, fun, 4)) user system elapsed 0.944 0.000 0.948 > set.seed(123L); system.time(res1 <- f1(23101, fun, 4)) user system elapsed 0.688 0.000 0.689 > identical(res0, res1) [1] TRUE 

which for this example, of course, is overshadowed by a simple vector solution (s)

 set.seed(123L); system.time(res2 <- rnorm(23101)) identical(res0, res2) 
+1
source

Source: https://habr.com/ru/post/1480380/


All Articles