I have a pandas dataframe as shown below.
item_id date 101 2016-01-05 101 2016-01-21 121 2016-01-08 121 2016-01-22 128 2016-01-19 128 2016-02-17 131 2016-01-11 131 2016-01-23 131 2016-01-24 131 2016-02-06 131 2016-02-07
I want to calculate the difference of days between a date column, but with respect to the item_id column. First I want to sort the dataframe with the date grouping by item_id. It should look like this.
item_id date 101 2016-01-05 101 2016-01-08 121 2016-01-21 121 2016-01-22 128 2016-01-17 128 2016-02-19 131 2016-01-11 131 2016-01-23 131 2016-01-24 131 2016-02-06 131 2016-02-07
Then I want to calculate the difference between dates again grouped by item_id So, the result should look like this
item_id date day_difference 101 2016-01-05 0 101 2016-01-08 3 121 2016-01-21 0 121 2016-01-22 1 128 2016-01-17 0 128 2016-02-19 2 131 2016-01-11 0 131 2016-01-23 12 131 2016-01-24 1 131 2016-02-06 13 131 2016-02-07 1
For sorting, I used something like this
df.groupby('item_id').apply(lambda x: new_df.sort('date'))
But that did not work. I can calculate the difference between consecutive lines by following
(df['date'] - df['date'].shift(1))
But not for grouping with item_id
source share