How to calculate the difference per day between consecutive lines of dataframe pandas with the condition

Question

How to calculate the difference per day between consecutive lines of dataframe pandas with the condition

I have a pandas dataframe as shown below.

item_id date 101 2016-01-05 101 2016-01-21 121 2016-01-08 121 2016-01-22 128 2016-01-19 128 2016-02-17 131 2016-01-11 131 2016-01-23 131 2016-01-24 131 2016-02-06 131 2016-02-07

I want to calculate the difference of days between a date column, but with respect to the item_id column. First I want to sort the dataframe with the date grouping by item_id. It should look like this.

 item_id date 101 2016-01-05 101 2016-01-08 121 2016-01-21 121 2016-01-22 128 2016-01-17 128 2016-02-19 131 2016-01-11 131 2016-01-23 131 2016-01-24 131 2016-02-06 131 2016-02-07

Then I want to calculate the difference between dates again grouped by item_id So, the result should look like this

  item_id date day_difference 101 2016-01-05 0 101 2016-01-08 3 121 2016-01-21 0 121 2016-01-22 1 128 2016-01-17 0 128 2016-02-19 2 131 2016-01-11 0 131 2016-01-23 12 131 2016-01-24 1 131 2016-02-06 13 131 2016-02-07 1

For sorting, I used something like this

df.groupby('item_id').apply(lambda x: new_df.sort('date'))

But that did not work. I can calculate the difference between consecutive lines by following

(df['date'] - df['date'].shift(1))

But not for grouping with item_id

+5

python pandas

Neil Feb 21 '16 at 8:54

source share

1 answer

jezrael · Accepted Answer · 2016-02-21T09:13:04+0000

I think you can use:

 df['date'] = df.groupby('item_id')['date'].apply(lambda x: x.sort_values()) df['diff'] = df.groupby('item_id')['date'].diff() / np.timedelta64(1, 'D') df['diff'] = df['diff'].fillna(0) print df item_id date diff 0 101 2016-01-05 0 1 101 2016-01-21 16 2 121 2016-01-08 0 3 121 2016-01-22 14 4 128 2016-01-19 0 5 128 2016-02-17 29 6 131 2016-01-11 0 7 131 2016-01-23 12 8 131 2016-01-24 1 9 131 2016-02-06 13 10 131 2016-02-07 1

How to calculate the difference per day between consecutive lines of dataframe pandas with the condition

More articles: