I am currently running part of the Python code in a cluster. Part of the rules imposed on me by slurm is that during the execution of my code, my block works with timelimit. This is not a problem in most cases, since I can just check my code with pickle and restart it.
At the end of the code, however, I need to write down all my data (I cannot write until all the calculations have been completed), which may take some time, since it is possible to collect very large pieces of data.
My problem is that in some cases the code ends with slurm because it exceeded the allowed runtime.
Is there a way to interrupt the write operation, stop the code, and restart where I left off?
source
share