The implementation of this function is not easy, since you will need to have separate queues for one target (so that the waiting code becomes much more complicated) or one queue from which you then miss goals that are in capacity (which leads to performance overheads). You can try to extend the ExecutorService to achieve this, but the extension seems non-trivial.
Updated answer / solution:
Thinking about this a bit more, the simplest solution to the blocking problem is to have both a blocking queue (as usual) and a queue map (one queue per target, as well as the number of available threads per target). The queue map is used only for tasks that have been handed over for execution (due to too many threads already running for this purpose) after the task has been retrieved from the normal locking queue.
So, the execution flow will look like this:
This solution avoids significant performance overhead or has a separate thread for queue management.
source share