Arithmetic by date (not index) in Pandas

(Python 2.7, Pandas 0.9)

This seems like a simple thing, but I can't figure out how to calculate the difference between two date columns in a data frame using Pandas. This dataframe already has an index, so making any column in DateTimeIndex undesirable.

To convert each date column from rows, I used:

data.Date_Column = pd.to_datetime(data.Date_Column) 

From there, to get the elapsed time between two columns, I:

 data.Closed_Date - data.Created_Date 

which returns an error:

 TypeError: %d format: a number is required, not a numpy.timedelta64 

Checking the dtypes in both columns gives datetime64 [ns], and the individual dates in the array give the timestamp of the type.

What am I missing?

EDIT:

Here is an example where I can create separate DateTimeIndex objects and do what I want, but when I try to do this in the context of a data frame, it fails.

 Created_Date = pd.DatetimeIndex(data['Created_Date'], copy=True) Closed_Date = pd.DatetimeIndex(data['Closed_Date'], copy=True) Closed_Date.day - Created_Date.day [Out] array([ -3, -16, 5, ..., 0, 0, 0]) 

Now the same as in the data frame:

 data.Created_Date = pd.DatetimeIndex(data['Created_Date'], copy=True) data.Closed_Date = pd.DatetimeIndex(data.Closed_Date, copy=True) data.Created_Date.day - data.Created_Date.day AttributeError: 'Series' object has no attribute 'day' 

Here are some of the data if you want to play with him:

 data['Created Date'][0:10].to_dict() {0: '1/1/2009 0:00', 1: '1/1/2009 0:00', 2: '1/1/2009 0:00', 3: '1/1/2009 0:00', 4: '1/1/2009 0:00', 5: '1/1/2009 0:00', 6: '1/1/2009 0:00', 7: '1/1/2009 0:00', 8: '1/1/2009 0:00', 9: '1/1/2009 0:00'} data['Closed Date'][0:10].to_dict() {0: '1/7/2009 0:00', 1: nan, 2: '1/1/2009 0:00', 3: '1/1/2009 0:00', 4: '1/1/2009 0:00', 5: '1/12/2009 0:00', 6: '1/12/2009 0:00', 7: '1/7/2009 0:00', 8: '1/10/2009 0:00', 9: '1/7/2009 0:00'} 
+4
source share
1 answer

Update. A useful solution is to simply break it with the DatetimeIndex constructor (which is usually much faster than applied), for example:

 DatetimeIndex(df['Created_Date']).day 

At 0.15, this will be available in the dt attribute (along with other datetime methods):

 df['Created_Date'].dt.day 

Your error was a syntax that, although one might hope that this worked, it is not:

 data.Created_Date.day - data.Created_Date.day AttributeError: 'Series' object has no attribute 'day' 

With more complex choices like this, you can use apply :

 In [111]: df['sub'] = df.apply(lambda x: x['Created_Date'].day - x['Closed_Date'].day, axis=1) In [112]: df[['Created_Date','Closed_Date','sub']] Out[112]: Created_Date Closed_Date sub 0 2009-01-07 00:00:00 2009-01-01 00:00:00 6 1 NaT 2009-01-01 00:00:00 9 2 2009-01-01 00:00:00 2009-01-01 00:00:00 0 3 2009-01-01 00:00:00 2009-01-01 00:00:00 0 4 2009-01-01 00:00:00 2009-01-01 00:00:00 0 5 2009-01-12 00:00:00 2009-01-01 00:00:00 11 6 2009-01-12 00:00:00 2009-01-01 00:00:00 11 7 2009-01-07 00:00:00 2009-01-01 00:00:00 6 8 2009-01-10 00:00:00 2009-01-01 00:00:00 9 9 2009-01-07 00:00:00 2009-01-01 00:00:00 6 

Be careful , you probably should do something separately with these NaT s:

 In [114]: df.ix[1][1].day # NaT.day Out[114]: -1 

.

Note. A similar behavior is observed with .days on timedelta with NaT :

 In [115]: df['sub2'] = df.apply(lambda x: (x['a'] - x['b']).days, axis=1) In [116]: df['sub2'][1] Out[116]: 92505 
+6
source

Source: https://habr.com/ru/post/1447290/


All Articles