Pandas Aggregate Group

I have a dataframe that looks conceptually as follows:

df = pd.DataFrame({
    "a": [1, 1, 1, 2, 2,3],
    "b": ["a", "a", "c", "a", "d","a"],
    "c": ["2", "3", "4", "2", "3","2"]
})

      a    b    c
  0   1   'a'  '2' 
  1   1   'a'  '3'
  2   1   'c'  '4'
  3   2   'a'  '2'
  4   2   'd'  '3'
  5   3   'a'  '2'

For each group of aI need to count unique values (b,c)before that.

So, in this example, ouptut should be [3,4,4].

(because in group 1 there are 3 unique pairs (b,c), and in groups 1 and 2 there are 4 unique values (b,c), and in groups 1 and 2 and 3 together there are only 4 unique (b,c)values.

I tried using expandingwith groupbyand nunique, but I could not understand the syntax.

Any help would be appreciated!

+4
source share
4 answers

This is a difficult question. Is that what you are after?

result = (
    df.a.drop_duplicates(keep='last')
    .reset_index()['index']
    .apply(lambda x: df.loc[df.index<=x].pipe(lambda x: (x.b+x.c).nunique()))
     )


result
Out[27]: 
0    3
1    4
Name: index, dtype: int64
+1
source

:

idx = df[['b','c']].drop_duplicates().index

, :

np.cumsum(df.iloc[idx,:].groupby('a').count()['b'])

a
1    3
2    4
+2

.

df['t'] = np.cumsum(~df[['b','c']].duplicated())
df.groupby('a')['t'].last()
Out[44]: 
a
1    3
2    4
3    4
Name: t, dtype: int64
+2

You can use drop_duplicatesafter your group and get an shapeobject:

df = pd.DataFrame({
    "a": [1, 1, 1, 2, 2],
    "b": ["a", "a", "c", "a", "d"],
    "c": ["2", "3", "4", "2", "3"]
})
result = df.groupby("a").apply(lambda x: x.drop_duplicates().shape[0])

If you want to convert the result to a list after:

result.tolist()

The result will be [3,2]with your example, because you have 3 unique pairs for the group a=1and 2 unique pairs for the group a=2.

If you want the number of unique passwords for the quantities "b" and "c":

df[["b", "c"]].drop_duplicates().shape[0]
0
source

Source: https://habr.com/ru/post/1692910/


All Articles