Basically, I have many tasks (in batches of about 1000), and the execution time for these tasks can vary widely (from less than seconds to 10 minutes). I know that if a task runs for more than a minute, I can kill it. These tasks are steps in optimizing some data mining model (but are not dependent on each other) and spend most of the time inside some extension function C, so they will not work together if I try to kill them gracefully.
Is there a distributed task queue that fits into this scheme --- AFAIK: celery allows you to cancel tasks that are ready for cooperation. But I could be wrong.
I recently asked a similar question about destroying dangling functions in pure python Kill a dangling function in Python in a multi-threaded environment .
I think I could subclass the celery task so that it spawns a new process and then fulfills its payload, interrupting its execution if it takes a long time, but then I will be killed due to the initialization of the new interpreter.
source share