It is like you are just doing a task at breakpoints, and then the next task reads at that breakpoint and does some things and writes a new breakpoint, etc. 10 times. I'm not sure why you need to break it as you have, why not just have a script wrapper that looks for the checkpoint file and uses it or runs from scratch?
Another option is to use the "Requirements" in the sending file and display only 100 machines or cores that your work can run on. Sort of:
Requirements = (machine == "astrolab01") || (machine == "astrolab02") || (machine == "astrolab03")
ensures that you never run more than three jobs at once. If these machines do not have multiple cores, then you need to do something like:
Requirements = (name == " slot1@astrolab01 ") || (name == " slot1@astrolab02 ")
source share