I have a binary code (say a.out ) that I want to call with different configurations. I want to run these configs on a 40-core computer in parallel. Below is a sketch of my code.
It is very simple: I generate the configuration and move on to the worker, and the worker calls the binary using the configuration using the subprocess. I also redirect the output to a file. Let me call this piece of code run.py
def worker(cmdlist, filename): outputfile = open(filename, 'wb') // here it essentially executes a.out config > outputfile subprocess.call(cmdlist, stderr=outputfile, stdout=outputfile) outputfile.close() def main(): pool = Pool(processes = 40) for config in all_configs filename, cmdlist = genCmd(config) res = pool.apply_async(worker, [cmdlist, filename]) results.append(res) for res in results: res.get() pool.close()
But after I let him go, I realized that I did not create as many processes as I wanted. I definitely presented more than 40 workers, but at the top, I only see about 20 of.out.
I see many of run.py that are in a "sleep" state (i.e. "S" on top). When I do ps auf , I also saw a lot of run.py in the "S +" state, with no binary files being output. Only about half of them spawned "a.out"
I wonder why this is happening? I redirect the output to the network hard drive, which may be the reason, but in the upper part I see only 10% wa (which, in my opinion, is 10% of the I / O timeout). I do not think that this leads to 50% of idle processors. Also, I should at least create a binary, instead of getting stuck in run.py My binary runtime is also quite long. I really have to see 40 jobs for a long time.
Any other explanations? Anything I did wrong in my Python code?