This is a pretty interesting topic. I ran into a similar problem developing a highly parallel batch processing solution. I will share my conclusions, but before that I have to say that it is always a bad idea to use some special solution for any parallel system.
Debugging, optimization, and further development can be a nightmare without proper architecture support. Say we have three dependent tasks:

First decision
will represent an abstraction of a composite or compound task to allow dependent tasks to run in the correct order and get rid of delays, wait / locks, complex task management, etc.

I will use simplified code to illustrate this approach:
interface Task extends Runnable { } class CompoundTasks implements Task { private List<Task> tasks = ...; public void add(Task task) { tasks.add(task); } @Override public void run() { for(Task t : tasks) { t.run(); } } }
Second solution
will set tasks with explicit dependencies and inform the performers about it. In principle, the rule is quite simple - if the task has unresolved dependencies, it should be postponed. This approach can be easily implemented and works very well.

Please note that the second solution will present a tiny performance limitation due to the fact that some resources will be needed to check tasks, manage queues, etc.
Let me develop our task-based approach:
abstract class DependentTask implements Task { private List<DependentTask> dependencies = ...; public void addDependency(DependentTask task) { dependencies.add(task); } public boolean hasDependenciesResolved() { boolean result = true; for(DependentTask t : dependencies) { if(!t.hasDependenciesResolved()) { result = false; break; } } return result; } @Override public abstract void run(); } class TaskQueue<T extends DependentTask> implements Runnable { private Queue<T> queue = ...; @Override public void run() { while(true) { if(!queue.isEmpty()) { T task = queue.poll();
Both approaches are easily tracked, so you can always understand what is happening.
source share