Column arithmetic in pandas frame using dates

I think it should be easy, but I hit the wall. I have a dataset that has been imported into a pandas dataframe from a Stata.dta file. Some columns contain date information. The file frame contains 100,000 lines, but a sample is provided:

cat event_date total 0 G2 2006-03-08 16 1 G2 NaT NaN 2 G2 NaT NaN 3 G3 2006-03-10 16 4 G3 2006-08-04 12 5 G3 2006-12-28 13 6 G3 2007-05-25 10 7 G4 2006-03-10 13 8 G4 2006-08-06 19 9 G4 2006-12-30 16 

Data is saved in datetime64 format:

 >>> mydata[['cat','event_date','total']].dtypes cat object event_date datetime64[ns] total float64 dtype: object 

All I would like to do is create a new column that gives the difference in days (not "us" or "ns" !!!) between event_date and the start date, for example, 2006-01-01. I tried the following:

 >>> mydata['new'] = mydata['event_date'] - np.datetime64('2006-01-01') 

... but I get a message:

 TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' 

I also tried the lambda function, but this does not work either.

However, if I just wanted to add one day to each date, I can successfully use:

 >>> mydata['plusone'] = mydata['event_date'] + np.timedelta64(1,'D') 

It works great.

Did I miss something direct here?

Thanks in advance for your help.

+6
source share
2 answers

I don't know why numpy datetime64 incompatible with pandas dtypes, but using datetime objects worked fine for me:

 In [39]: import datetime as dt mydata['new'] = mydata['event_date'] - dt.datetime(2006,1,1) mydata Out[39]: cat event_date total new Index 0 G2 2006-03-08 16 66 days 1 G2 NaT NaN NaT 2 G2 NaT NaN NaT 3 G3 2006-03-10 16 68 days 4 G3 2006-08-04 12 215 days 5 G3 2006-12-28 13 361 days 6 G3 2007-05-25 10 509 days 7 G4 2006-03-10 13 68 days 8 G4 2006-08-06 19 217 days 9 G4 2006-12-30 16 363 days 
+6
source

Make sure you have an early version of pandas and numpy (> = 1.7):

 In [11]: df.event_date - pd.Timestamp('2006-01-01') Out[11]: 0 66 days 1 NaT 2 NaT 3 68 days 4 215 days 5 361 days 6 509 days 7 68 days 8 217 days 9 363 days Name: event_date, dtype: timedelta64[ns] 
+2
source

Source: https://habr.com/ru/post/973677/


All Articles