I think the easiest solution would be to split your data into smaller pieces and run mclapply separately on those pieces. Then you can set the number of cores for each mclapply run. This works probably better with calculations that have a small variance of wrt runtime.
I created a small quick and dirty layout of what this might look like:
library(parallel) library(lubridate)
source share