How to see the progress of a Dask Compute task?

Question

How to see the progress of a Dask Compute task?

I would like to see progress in the Jupyternotebook, when I run a computational task using Dask, I count all the values of the "id" column from a large csv + 4 GB file, so any ideas?

import dask.dataframe as dd

df = dd.read_csv('data/train.csv')
df.id.count().compute()

+4

python-3.x jupyter-notebook distributed-computing dask

Ambigus9 Feb 28 '18 at 10:33

source share

1 answer

Mocklin · Accepted Answer · 2018-02-28T22:38:21+0000

If you are using a single machine scheduler, do the following:

from dask.diagnostics import ProgressBar
ProgressBar().register()

http://dask.pydata.org/en/latest/diagnostics-local.html

If you use a distributed scheduler, do the following:

from dask.distributed import progress

result = df.id.count.persist()
progress(result)

Or just use the toolbar

http://dask.pydata.org/en/latest/diagnostics-distributed.html

How to see the progress of a Dask Compute task?

More articles: