How to calculate the difference per day between consecutive lines of dataframe pandas with the condition

I have a pandas dataframe as shown below.

item_id date 101 2016-01-05 101 2016-01-21 121 2016-01-08 121 2016-01-22 128 2016-01-19 128 2016-02-17 131 2016-01-11 131 2016-01-23 131 2016-01-24 131 2016-02-06 131 2016-02-07 

I want to calculate the difference of days between a date column, but with respect to the item_id column. First I want to sort the dataframe with the date grouping by item_id. It should look like this.

 item_id date 101 2016-01-05 101 2016-01-08 121 2016-01-21 121 2016-01-22 128 2016-01-17 128 2016-02-19 131 2016-01-11 131 2016-01-23 131 2016-01-24 131 2016-02-06 131 2016-02-07 

Then I want to calculate the difference between dates again grouped by item_id So, the result should look like this

  item_id date day_difference 101 2016-01-05 0 101 2016-01-08 3 121 2016-01-21 0 121 2016-01-22 1 128 2016-01-17 0 128 2016-02-19 2 131 2016-01-11 0 131 2016-01-23 12 131 2016-01-24 1 131 2016-02-06 13 131 2016-02-07 1 

For sorting, I used something like this

df.groupby('item_id').apply(lambda x: new_df.sort('date'))

But that did not work. I can calculate the difference between consecutive lines by following

(df['date'] - df['date'].shift(1))

But not for grouping with item_id

+5
source share
1 answer

I think you can use:

 df['date'] = df.groupby('item_id')['date'].apply(lambda x: x.sort_values()) df['diff'] = df.groupby('item_id')['date'].diff() / np.timedelta64(1, 'D') df['diff'] = df['diff'].fillna(0) print df item_id date diff 0 101 2016-01-05 0 1 101 2016-01-21 16 2 121 2016-01-08 0 3 121 2016-01-22 14 4 128 2016-01-19 0 5 128 2016-02-17 29 6 131 2016-01-11 0 7 131 2016-01-23 12 8 131 2016-01-24 1 9 131 2016-02-06 13 10 131 2016-02-07 1 
+5
source

Source: https://habr.com/ru/post/1243496/


All Articles