I am trying to convert one column of my data frame to a date and time. After discussing here https://github.com/dask/dask/issues/863 I tried the following code:
import dask.dataframe as dd
df['time'].map_partitions(pd.to_datetime, columns='time').compute()
But I get the following error message
ValueError: Metadata inference failed, please provide 'meta' keyword
What exactly should I put under the meta? Should I put the dictionary of ALL columns in df or just the column 'time'? and what type should I put? I tried dtype and datetime64, but so far none of them work.
Thank you and I appreciate your guidance,
Refresh
I will include new error messages here:
1) Using a timestamp
df['trd_exctn_dt'].map_partitions(pd.Timestamp).compute()
TypeError: Cannot convert input to Timestamp
2) Using datetime and meta
meta = ('time', pd.Timestamp)
df['time'].map_partitions(pd.to_datetime,meta=meta).compute()
TypeError: to_datetime() got an unexpected keyword argument 'meta'
3) Just use date and time: stuck at 2%
In [14]: df['trd_exctn_dt'].map_partitions(pd.to_datetime).compute()
[ ] | 2% Completed | 2min 20.3s
In addition, I would like to be able to specify the date format, as I would do in pandas:
pd.to_datetime(df['time'], format = '%m%d%Y'
Update 2
Dask 0.11 meta. , 2% 2 .
df['trd_exctn_dt'].map_partitions(pd.to_datetime, meta=meta).compute()
[ ] | 2% Completed | 30min 45.7s
3
:
def parse_dates(df):
return pd.to_datetime(df['time'], format = '%m/%d/%Y')
df.map_partitions(parse_dates, meta=meta)
,