Pandas - a group based on values from two columns

Question

Pandas - a group based on values from two columns

I have this framework:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'fuz', 'baz', 'fuz', 'coo'],
                   'B' : ['one', 'one', 'two', 'two',
                          'three', 'three', 'four', 'one']})

It looks like this:

    A      B
0  foo    one
1  bar    one
2  foo    two
3  bar    two
4  fuz  three
5  baz  three
6  fuz   four
7  coo    one

I would like to create a new column group. The group combines combinations of unique values in columns A + B.

It considers unique values for each column. He then looks at the values in another column for items already in the group.

The result will look like this:

    A      B    group
0  foo    one     1
1  bar    one     1
2  foo    two     1
3  bar    two     1
4  fuz  three     2
5  baz  three     2
6  fuz   four     2
7  coo    one     1

In this example, we start with fooin column A. Everyone foowill be at group1. Related values in B oneand two=> also in group1.

Corresponding values oneand twoare in column A foo, barand coo=> also group1.

The same principle gives us group2.

What would be the best way to do this?

+4

python pandas grouping

paulwasit 11 . '17 12:31

2

zipa · Answer 1 · 2017-04-11T12:44:05+0000

, , , , :

import pandas as pd
import numpy as np

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'fuz', 'baz', 'fuz', 'coo'],
                   'B' : ['one', 'one', 'two', 'two',
                          'three', 'three', 'four', 'one']})
g1 = df[df['A']=='foo']
df['group'] = np.where(df['A'].isin(g1['A'])|df['B'].isin(g1['B']),1,2)

zhe · Answer 2 · 2017-04-11T13:11:34+0000

, zipa, , , , df 3 .

df = pd.DataFrame({'A' : ['foo', 'bae', 'foo', 'bar',
                          'fuz', 'baz', 'fzz', 'coo'],
                   'B' : ['one', 'one', 'two', 'two',
                          'three', 'three', 'four', 'one']})
df['group'] = [None]*len(df)
i = 1
while True:
  value = df[df['group'].isnull()].iloc[0, 0]
  g1 = df[df['A']==value]
  df['group']=np.where(df['A'].isin(g1['A'])|df['B'].isin(g1['B']),i,df['group'])
  if not any(df['group'].isnull()):
     break
  i += 1
print(df)

resule

         A      B group
0  foo    one     1
1  bae    one     1
2  foo    two     1
3  bar    two     1
4  fuz  three     2
5  baz  three     2
6  fzz   four     3
7  coo    one     1

Pandas - a group based on values ​​from two columns

More articles:

Pandas - a group based on values from two columns