Presumably there is a maximum size that you are willing to endure; pre-distribute and fill this level, and then trim if necessary. This avoids the risk of not satisfying the requirement of doubling the size, even if a small amount of memory is required; it does not fire earlier and includes only one, not the redistribution of log (n). Here is the function that takes the maximum size, the generation function, and the token returned by the generation function when there is nothing that could be generated. We return to n results before returning.
filln <- function(n, FUN, ..., RESULT_TYPE="numeric", DONE_TOKEN=NA_real_) { results <- vector(RESULT_TYPE, n) i <- 0L while (i < n) { ans <- FUN(..., DONE_TOKEN=DONE_TOKEN) if (identical(ans, DONE_TOKEN)) break i <- i + 1L results[[i]] <- ans } if (i == n) warning("intolerably large result") else length(results) <- i results }
Here is the generator
fun <- function(thresh, DONE_TOKEN) { x <- rnorm(1) if (x > thresh) DONE_TOKEN else x }
and in action
> set.seed(123L); length(filln(10000, fun, 3)) [1] 163 > set.seed(123L); length(filln(10000, fun, 4)) [1] 10000 Warning message: In filln(10000, fun, 4) : intolerably large result > set.seed(123L); length(filln(100000, fun, 4)) [1] 23101
We can compare the overhead approximately by comparing with what we know beforehand how much space is required
f1 <- function(n, FUN, ...) { i <- 0L result <- numeric(n) while (i < n) { i <- i + 1L result[i] <- FUN(...) } result }
Here we check the time and value of one result.
> set.seed(123L); system.time(res0 <- filln(100000, fun, 4)) user system elapsed 0.944 0.000 0.948 > set.seed(123L); system.time(res1 <- f1(23101, fun, 4)) user system elapsed 0.688 0.000 0.689 > identical(res0, res1) [1] TRUE
which for this example, of course, is overshadowed by a simple vector solution (s)
set.seed(123L); system.time(res2 <- rnorm(23101)) identical(res0, res2)