Vectorized operations in datetime column in pandas

Question

Vectorized operations in datetime column in pandas

I want to take a column of datetime objects and return a column of integers that are "days from this day to today." I can do it ugly by looking for a more beautiful (and faster) way.

So, suppose I have a dataframe with a datetime column, for example:

11    2014-03-04 17:16:26+00:00
12    2014-03-10 01:35:56+00:00
13    2014-03-15 02:35:51+00:00
14    2014-03-20 05:55:47+00:00
15    2014-03-26 04:56:33+00:00
Name: datetime, dtype: object

And each element looks like this:

datetime.datetime(2014, 3, 4, 17, 16, 26, tzinfo=<UTC>)

Suppose I want to calculate how many days each observation happened, and return it as a prime integer. I know that I can just use it applytwice, but is there a way to vectorize / clean?

today = datetime.datetime.today().date()
df_dates = df['datetime'].apply(lambda x: x.date())
days_ago = today - df_dates

Which gives a series of timedelta64 [ns].

11   56 days, 00:00:00
12   50 days, 00:00:00
13   45 days, 00:00:00
14   40 days, 00:00:00
15   34 days, 00:00:00
Name: datetime, dtype: timedelta64[ns]

And finally, if I want this integer:

days_ago_as_int = days_ago.apply(lambda x: x.item().days)
days_ago_as_int
11    56
12    50
13    45
14    40
15    34
Name: datetime, dtype: int64

Any thoughts?

Related questions that did not quite understand what I was asking:

Pandas Python-

Pandas

Karl D, , - ( , , , ?):

converted_dates = df['date'].values.astype('datetime64[D]')
today_date = np.datetime64(dt.date.today())
print converted_dates
print today_date
print today_date - converted_dates

[2014-01-16 00:00:00 
2014-01-19 00:00:00 
2014-01-22 00:00:00
2014-01-26 00:00:00
2014-01-29 00:00:00]

2014-04-30 00:00:00

[16189 days, 0:08:20.637994
16189 days, 0:08:20.637991
16189 days, 0:08:20.637988
16189 days, 0:08:20.637984
16189 days, 0:08:20.637981]

+4

python pandas

exp1orer 30 . '14 2:13

1

Karl D. · Accepted Answer · 2014-04-30T02:42:14+0000

( date)?

import datetime as dt
df['foo'] = (np.datetime64(dt.date.today()) 
             - df['date'].values.astype('datetime64[D]'))
print df

                 date     foo
0 2014-03-04 17:16:26 56 days
1 2014-03-10 01:35:56 50 days
2 2014-03-15 02:35:51 45 days
3 2014-03-20 05:55:47 40 days
4 2014-03-26 04:56:33 34 days

, int:

df['foo'] = (np.datetime64(dt.date.today()) 
             - df['date'].values.astype('datetime64[D]')).astype(int)
print df
                  date  foo
0 2014-03-04 17:16:26   56
1 2014-03-10 01:35:56   50
2 2014-03-15 02:35:51   45
3 2014-03-20 05:55:47   40
4 2014-03-26 04:56:33   34

,

print np.datetime64(dt.date.today()) - df.index.values.astype('datetime64[D]')

[56 50 45 40 34]

Edit: ?

>>> print df

                 date
0 2014-03-04 17:16:26
1 2014-03-10 01:35:56
2 2014-03-15 02:35:51
3 2014-03-20 05:55:47
4 2014-03-26 04:56:33

, datetime64 pandas, :

>>> df['today'] = dt.date.today()
>>> df['foo'] = (df['today'].values.astype('datetime64[D]')
               - df['date'].values.astype('datetime64[D]'))
>>> print df

                 date       today     foo
0 2014-03-04 17:16:26  2014-05-14 71 days
1 2014-03-10 01:35:56  2014-05-14 65 days
2 2014-03-15 02:35:51  2014-05-14 60 days
3 2014-03-20 05:55:47  2014-05-14 55 days
4 2014-03-26 04:56:33  2014-05-14 49 days

Vectorized operations in datetime column in pandas

More articles: