Is it possible to set an index column when reading CSV using Python dask?

When using Python Pandas to read CSV, you can specify an index column. Is this possible using Python Dask when reading a file, as opposed to setting an index after?

For example, using pandas:

df = pandas.read_csv(filename, index_col=0)

Ideally, using dask might be as follows:

df = dask.dataframe.read_csv(filename, index_col=0)

I tried

df = dask.dataframe.read_csv(filename).set_index(?)

but the index column has no name (and this seems slow).

+4
source share
1 answer

No, these should be two separate methods. If you try this, Dask will inform you in a nice error message.

In [1]: import dask.dataframe as dd
In [2]: df = dd.read_csv('*.csv', index='my-index')
ValueError: Keyword 'index' not supported dd.read_csv(...).set_index('my-index') instead

But it will not be slower or faster than doing it the other way.

+3
source

Source: https://habr.com/ru/post/1685521/


All Articles