How to see the progress of a Dask Compute task?

I would like to see progress in the Jupyternotebook, when I run a computational task using Dask, I count all the values ​​of the "id" column from a large csv + 4 GB file, so any ideas?

import dask.dataframe as dd

df = dd.read_csv('data/train.csv')
df.id.count().compute()
+4
source share
1 answer

If you are using a single machine scheduler, do the following:

from dask.diagnostics import ProgressBar
ProgressBar().register()

http://dask.pydata.org/en/latest/diagnostics-local.html

If you use a distributed scheduler, do the following:

from dask.distributed import progress

result = df.id.count.persist()
progress(result)

Or just use the toolbar

http://dask.pydata.org/en/latest/diagnostics-distributed.html

+3
source

Source: https://habr.com/ru/post/1694266/


All Articles