The most obvious problem is that you are the victim of one of the classic mistakes: do not prevail the output vector result . Adding one value at a time can be very inefficient for large vectors.
In your case, the result does not have to be a vector: you can accumulate the results in one value:
result = 0 for(g in 1:nrow(set)) {
But I think that the most important performance improvement you could make is to precompile the expressions that are currently being re-evaluated in the foreach . You can do this with a separate foreach . I also suggest using solve in different ways to avoid a second matrix multiplication:
X_gamma_list <- foreach(g=1:nrow(set)) %dopar% { X_gamma <- X[, which(set[g,] != 0)] I - (c/(1+c)) * (X_gamma %*% solve(crossprod(X_gamma), t(X_gamma))) }
These calculations are now performed only once, and not once for each Y column, which is 700 times less in your case.
In the same spirit, it makes sense to expand the expression ((1+c)^(-sum(set[g,])/2)) , as suggested by tim riffe, as well as -T / 2 , while we are in it:
a <- (1+c) ^ (-rowSums(set) / 2) nT2 <- -T / 2
To isplitCols over the columns of the zoo Y object, I suggest using the isplitCols function from the itertools package. Make sure you download itertools at the top of the script:
library(itertools)
isplitCols allow you to send only those columns that are necessary for each task, and not send the entire object to all employees. The only trick is that you need to remove the dim attribute from the resulting zoo objects for your code to work, since isplitCols uses drop=TRUE .
Finally, here is the main foreach :
denom <- foreach(Yi=isplitCols(Y, chunkSize=1), .packages='zoo') %dopar% { dim(Yi) <- NULL
Please note that I will not execute the inner loop in parallel. This would only make sense if there werenβt enough columns in Y to keep all your processors busy. Parallelizing the inner loop can lead to tasks that are too short, effectively turn off computation, and make code much slower. It is much more important to efficiently execute the inner loop, since g large.