How to specify the number of threads / processes for the default date planner

Question

How to specify the number of threads / processes for the default date planner

Is there a way to limit the number of cores used by the default scheduler by default (the default when using dask dataframes)?

With, computeyou can specify it using:

df.compute(get=dask.threaded.get, num_workers=20)

But I was wondering if there is a way to set this as the default, so you don't need to specify this for every call compute?

For example, this will be interesting in the case of a small cluster (for example, of 64 cores), but which is shared with other people (without a job system), and I do not want me to occupy all the kernels when starting the calculation using dask.

+4

python dask

joris Nov 15 '16 at 23:28

source share

1 answer

MRocklin · Accepted Answer · 2016-11-16T13:15:32+0000

ThreadPool

from multiprocessing.pool import ThreadPool
import dask
dask.set_options(pool=ThreadPool(20))

How to specify the number of threads / processes for the default date planner

More articles: