Is there a way to limit the number of cores used by the default scheduler by default (the default when using dask dataframes)?
With, computeyou can specify it using:
df.compute(get=dask.threaded.get, num_workers=20)
But I was wondering if there is a way to set this as the default, so you don't need to specify this for every call compute?
For example, this will be interesting in the case of a small cluster (for example, of 64 cores), but which is shared with other people (without a job system), and I do not want me to occupy all the kernels when starting the calculation using dask.
joris source
share