Change in the number of cores in parallel computing in R

Question

Change in the number of cores in parallel computing in R

I am executing parallel code in R using the parallel and mclapply , which takes a given number of cores as a parameter.

If I have a job that will work for a couple of days, is there a way to write (or wrap) my mclapply function to use fewer cores during peak business hours and increase off-peak usage?

+5

parallel-processing r

Megatron Oct 15 '15 at 14:11

source share

1 answer

cryo111 · Answer 1 · 2015-10-15T15:46:17+0000

I think the easiest solution would be to split your data into smaller pieces and run mclapply separately on those pieces. Then you can set the number of cores for each mclapply run. This works probably better with calculations that have a small variance of wrt runtime.

I created a small quick and dirty layout of what this might look like:

 library(parallel) library(lubridate) #you would have to come up with your own function #for the number of cores to be used determine_cores=function(hh) { #hh will be the hour of the day if (hh>17|hh<9) { return(4) } else { return(2) } } #prepare some sample data set.seed(1234) myData=lapply(seq(1e-1,1,1e-1),function(x) rnorm(1e7,0,x)) #calculate SD with mclapply WITHOUT splitting of data into chunks #we need this for comparison compRes=mclapply(myData,function(x) sd(x),mc.cores=4) set.seed(1234) #this will hold the results of the separate mclapply calls res=list() #starting position within myData chunk_start_pos=1 calc_flag=TRUE while(calc_flag) { #use the function defined above to determine how many cores we may use core_num=determine_cores(lubridate::hour(Sys.time())) #determine end position of data chunk chunk_end_pos=chunk_start_pos+core_num-1 if (chunk_end_pos>=length(myData)) { chunk_end_pos=length(myData) calc_flag=FALSE } message("Calculating elements ",chunk_start_pos," to ",chunk_end_pos) #mclapply call on data chunk #store data in res res[[length(res)+1]]=mclapply(myData[chunk_start_pos:(chunk_start_pos+core_num-1)], function(x) sd(x), mc.preschedule=FALSE, mc.cores=core_num) #calculate new start position chunk_start_pos=chunk_start_pos+core_num } #let compare the results all.equal(compRes,unlist(res,recursive=FALSE)) #TRUE

Change in the number of cores in parallel computing in R

More articles: