Semaphores in dask.distributed?

Question

Semaphores in dask.distributed?

I have a dask cluster with n workers and want the workers to execute database queries. But the database is capable of processing m queries in parallel, where m <n. How can I model this in dask.distributed? Only employees should work on such a task in parallel.

I have seen distributed locks support ( http://distributed.readthedocs.io/en/latest/api.html#distributed.Lock ). But with this, I could only execute one request at a time, and not m.

I also saw that I could define resources per employee ( https://distributed.readthedocs.io/en/latest/resources.html ). But this is also not suitable, since the database is not dependent on workers. I would either have to define 1 database resource per employee (which leads to too many concurrent queries). Or I would have to allocate database resources m with n workers, which makes cluster configuration and suboptimal execution difficult.

Is it possible to define something like semaphores in dask to solve this?

+5

dask dask-distributed

Christian trebing Feb 07 '18 at 15:23

source share

2 answers

Mocklin · Answer 1 · 2018-02-07T15:36:50+0000

Perhaps you could hack something along with Locks and Variables.

A cleaner solution would be to simply implement Semaphores, just as locks are implemented. Depending on your experience, this may not be so difficult (locking implementation is 150 lines) and would welcome a transfer request.

https://github.com/dask/distributed/blob/master/distributed/lock.py

Ngo · Answer 2 · 2018-03-09T13:15:32+0000

You can use dask.distributed.Queue

 class DDSemaphore(object): """Dask Distributed Semaphore""" def __init__(self, value=1): self._q = dask.distributed.Queue() for _ in range(value): self._q.put(42) def acquire(): self._q.get() def release(): self._q.put(42)

Semaphores in dask.distributed?

More articles: