I think it should be easy, but I hit the wall. I have a dataset that has been imported into a pandas dataframe from a Stata.dta file. Some columns contain date information. The file frame contains 100,000 lines, but a sample is provided:
cat event_date total 0 G2 2006-03-08 16 1 G2 NaT NaN 2 G2 NaT NaN 3 G3 2006-03-10 16 4 G3 2006-08-04 12 5 G3 2006-12-28 13 6 G3 2007-05-25 10 7 G4 2006-03-10 13 8 G4 2006-08-06 19 9 G4 2006-12-30 16
Data is saved in datetime64 format:
>>> mydata[['cat','event_date','total']].dtypes cat object event_date datetime64[ns] total float64 dtype: object
All I would like to do is create a new column that gives the difference in days (not "us" or "ns" !!!) between event_date and the start date, for example, 2006-01-01. I tried the following:
>>> mydata['new'] = mydata['event_date'] - np.datetime64('2006-01-01')
... but I get a message:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I also tried the lambda function, but this does not work either.
However, if I just wanted to add one day to each date, I can successfully use:
>>> mydata['plusone'] = mydata['event_date'] + np.timedelta64(1,'D')
It works great.
Did I miss something direct here?
Thanks in advance for your help.