I have a dask cluster with n workers and want the workers to execute database queries. But the database is capable of processing m queries in parallel, where m <n. How can I model this in dask.distributed? Only employees should work on such a task in parallel.
I have seen distributed locks support ( http://distributed.readthedocs.io/en/latest/api.html#distributed.Lock ). But with this, I could only execute one request at a time, and not m.
I also saw that I could define resources per employee ( https://distributed.readthedocs.io/en/latest/resources.html ). But this is also not suitable, since the database is not dependent on workers. I would either have to define 1 database resource per employee (which leads to too many concurrent queries). Or I would have to allocate database resources m with n workers, which makes cluster configuration and suboptimal execution difficult.
Is it possible to define something like semaphores in dask to solve this?
source share