Pandas - Check if the numbers in a column are in a row

Question

Pandas - Check if the numbers in a column are in a row

I have a pandas dataframe as follows:

user_id product_id order_number 1 1 1 1 1 2 1 1 3 1 2 1 1 2 5 2 1 1 2 1 3 2 1 4 2 1 5 3 1 1 3 1 2 3 1 6

I wanted to request this df for the longest line (no program_number is missing) and the last line (from the last learning_number).

The ideal result is:

 user_id product_id longest_streak last_streak 1 1 3 3 1 2 0 0 2 1 3 3 3 1 2 0

I would be grateful for your understanding.

+5

python pandas dataframe data-manipulation data-science

iLoeng Mar 25 '18 at 0:10

source share

3 answers

I'm still not quite sure how you determined last_streak , but, considering that the same combination of user and product does not repeat, the longest bars are calculated:

 import itertools def extract_streaks(data): streaks = [len(list(rows)) for d,rows in itertools.groupby(data) if d==1.0] return max(streaks) + 1 if streaks else 0 df['diffs'] = df.order_number.diff() df.groupby(['user_id', 'product_id'])['diffs'].apply(extract_streaks) #user_id product_id #1 1 3 # 2 0 #2 1 3

+1

Dyz Mar 25 '18 at 1:36

source share

You can try

 s=df.assign(key=1).set_index(['user_id','product_id','order_number']).key.unstack() s=s.notnull().astype(int).diff(axis=1).fillna(0).ne(0).cumsum(axis=1).mask(s.isnull()) s=s.apply(pd.value_counts,1) s=s.mask(s==1,0) pd.concat([s.max(1),s.ffill(axis=1).iloc[:,-1]],1) Out[974]: 0.0 2.0 user_id product_id 1 1 3.0 3.0 2 0.0 0.0 2 1 3.0 3.0

+1

Wen Mar 25 '18 at 2:29

source share

piRSquared · Accepted Answer · 2018-03-25T06:49:12+0000

Using a loop and defaultdict

 a = defaultdict(lambda:None) longest = defaultdict(int) current = defaultdict(int) for i, j, k in df.itertuples(index=False): if a[(i, j)] == k - 1: current[(i, j)] += 1 if current[(i, j)] else 2 longest[(i, j)] = max(longest[(i, j)], current[(i, j)]) else: current[(i, j)] = 0 longest[(i, j)] |= 0 a[(i, j)] = k pd.concat( [pd.Series(d) for d in [longest, current]], axis=1, keys=['longest_streak', 'last_streak'] ).rename_axis(['user_id', 'product_id']).reset_index() user_id product_id longest_streak last_streak 0 1 1 3 3 1 1 2 0 0 2 2 1 3 3 3 3 1 2 0

Pandas - Check if the numbers in a column are in a row

More articles: