Is it possible to set an index column when reading CSV using Python dask?

Question

Is it possible to set an index column when reading CSV using Python dask?

When using Python Pandas to read CSV, you can specify an index column. Is this possible using Python Dask when reading a file, as opposed to setting an index after?

For example, using pandas:

df = pandas.read_csv(filename, index_col=0)

Ideally, using dask might be as follows:

df = dask.dataframe.read_csv(filename, index_col=0)

I tried

df = dask.dataframe.read_csv(filename).set_index(?)

but the index column has no name (and this seems slow).

+4

python csv dataframe dask

Jaydog 12 sept '17 at 10:53

source share

1 answer

Mocklin · Answer 1 · 2017-09-12T11:53:46+0000

No, these should be two separate methods. If you try this, Dask will inform you in a nice error message.

In [1]: import dask.dataframe as dd
In [2]: df = dd.read_csv('*.csv', index='my-index')
ValueError: Keyword 'index' not supported dd.read_csv(...).set_index('my-index') instead

But it will not be slower or faster than doing it the other way.

Is it possible to set an index column when reading CSV using Python dask?

More articles: