(Python 2.7, Pandas 0.9)
This seems like a simple thing, but I can't figure out how to calculate the difference between two date columns in a data frame using Pandas. This dataframe already has an index, so making any column in DateTimeIndex undesirable.
To convert each date column from rows, I used:
data.Date_Column = pd.to_datetime(data.Date_Column)
From there, to get the elapsed time between two columns, I:
data.Closed_Date - data.Created_Date
which returns an error:
TypeError: %d format: a number is required, not a numpy.timedelta64
Checking the dtypes in both columns gives datetime64 [ns], and the individual dates in the array give the timestamp of the type.
What am I missing?
EDIT:
Here is an example where I can create separate DateTimeIndex objects and do what I want, but when I try to do this in the context of a data frame, it fails.
Created_Date = pd.DatetimeIndex(data['Created_Date'], copy=True) Closed_Date = pd.DatetimeIndex(data['Closed_Date'], copy=True) Closed_Date.day - Created_Date.day [Out] array([ -3, -16, 5, ..., 0, 0, 0])
Now the same as in the data frame:
data.Created_Date = pd.DatetimeIndex(data['Created_Date'], copy=True) data.Closed_Date = pd.DatetimeIndex(data.Closed_Date, copy=True) data.Created_Date.day - data.Created_Date.day AttributeError: 'Series' object has no attribute 'day'
Here are some of the data if you want to play with him:
data['Created Date'][0:10].to_dict() {0: '1/1/2009 0:00', 1: '1/1/2009 0:00', 2: '1/1/2009 0:00', 3: '1/1/2009 0:00', 4: '1/1/2009 0:00', 5: '1/1/2009 0:00', 6: '1/1/2009 0:00', 7: '1/1/2009 0:00', 8: '1/1/2009 0:00', 9: '1/1/2009 0:00'} data['Closed Date'][0:10].to_dict() {0: '1/7/2009 0:00', 1: nan, 2: '1/1/2009 0:00', 3: '1/1/2009 0:00', 4: '1/1/2009 0:00', 5: '1/12/2009 0:00', 6: '1/12/2009 0:00', 7: '1/7/2009 0:00', 8: '1/10/2009 0:00', 9: '1/7/2009 0:00'}