I have a lot of user / item / time data. I want to know what items were consumed first, second, etc. By all users.
My questions are: if I have a data frame that is already sorted by time (descending), will it remain sorted by default in the groupby process? and how can I pull out the first two elements consumed by any user, even if the user has not used two elements?
import pandas as pd df = pd.DataFrame({'item_id': ['b', 'b', 'a', 'c', 'a', 'b'], 'user_id': [1,2,1,1,3,1], 'time': range(6)}) print df pd.get_dummies(df['item_id']) gp = df.groupby('user_id').head() print gp
This gives:
item_id time user_id 0 b 0 1 1 b 1 2 2 a 2 1 3 c 3 1 4 a 4 3 5 b 5 1 item_id time user_id user_id 1 0 b 0 1 2 a 2 1 3 c 3 1 5 b 5 1 2 1 b 1 2 3 4 a 4 3
Now I need to pull out the top two values โโof item_id, something like this (but saving the user_id column doesn't matter):
user_id order item_id 1 0 b 1 1 a 2 0 b 3 0 a
source share