How to specify the number of threads / processes for the default date planner

Is there a way to limit the number of cores used by the default scheduler by default (the default when using dask dataframes)?

With, computeyou can specify it using:

df.compute(get=dask.threaded.get, num_workers=20)

But I was wondering if there is a way to set this as the default, so you don't need to specify this for every call compute?

For example, this will be interesting in the case of a small cluster (for example, of 64 cores), but which is shared with other people (without a job system), and I do not want me to occupy all the kernels when starting the calculation using dask.

+4
source share
1 answer

ThreadPool

from multiprocessing.pool import ThreadPool
import dask
dask.set_options(pool=ThreadPool(20))
+5

Source: https://habr.com/ru/post/1660874/


All Articles