Pandas - Check if the numbers in a column are in a row

I have a pandas dataframe as follows:

user_id product_id order_number 1 1 1 1 1 2 1 1 3 1 2 1 1 2 5 2 1 1 2 1 3 2 1 4 2 1 5 3 1 1 3 1 2 3 1 6 

I wanted to request this df for the longest line (no program_number is missing) and the last line (from the last learning_number).

The ideal result is:

 user_id product_id longest_streak last_streak 1 1 3 3 1 2 0 0 2 1 3 3 3 1 2 0 

I would be grateful for your understanding.

+5
source share
3 answers

Using a loop and defaultdict

 a = defaultdict(lambda:None) longest = defaultdict(int) current = defaultdict(int) for i, j, k in df.itertuples(index=False): if a[(i, j)] == k - 1: current[(i, j)] += 1 if current[(i, j)] else 2 longest[(i, j)] = max(longest[(i, j)], current[(i, j)]) else: current[(i, j)] = 0 longest[(i, j)] |= 0 a[(i, j)] = k pd.concat( [pd.Series(d) for d in [longest, current]], axis=1, keys=['longest_streak', 'last_streak'] ).rename_axis(['user_id', 'product_id']).reset_index() user_id product_id longest_streak last_streak 0 1 1 3 3 1 1 2 0 0 2 2 1 3 3 3 3 1 2 0 
0
source

I'm still not quite sure how you determined last_streak , but, considering that the same combination of user and product does not repeat, the longest bars are calculated:

 import itertools def extract_streaks(data): streaks = [len(list(rows)) for d,rows in itertools.groupby(data) if d==1.0] return max(streaks) + 1 if streaks else 0 df['diffs'] = df.order_number.diff() df.groupby(['user_id', 'product_id'])['diffs'].apply(extract_streaks) #user_id product_id #1 1 3 # 2 0 #2 1 3 

+1
source

You can try

 s=df.assign(key=1).set_index(['user_id','product_id','order_number']).key.unstack() s=s.notnull().astype(int).diff(axis=1).fillna(0).ne(0).cumsum(axis=1).mask(s.isnull()) s=s.apply(pd.value_counts,1) s=s.mask(s==1,0) pd.concat([s.max(1),s.ffill(axis=1).iloc[:,-1]],1) Out[974]: 0.0 2.0 user_id product_id 1 1 3.0 3.0 2 0.0 0.0 2 1 3.0 3.0 
+1
source

Source: https://habr.com/ru/post/1276059/


All Articles