An example of each group after a group panda

I know that someone must have answered this, but I just could not find him.

Problem : select each group after the groupby operation.

import pandas as pd df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]}) grouped = df.groupby('b') # now sample from each group, eg, I want 30% of each group 
+20
source share
2 answers

Apply lambda and call sample with parameter frac :

 In [2]: df = pd.DataFrame({'a': [1,2,3,4,5,6,7], 'b': [1,1,1,0,0,0,0]})โ€‹ grouped = df.groupby('b') grouped.apply(lambda x: x.sample(frac=0.3)) Out[2]: ab b 0 6 7 0 1 2 3 1 
+37
source

Sample fraction of each group

You can use GroupBy.apply with sample . You do not need to use lambda; apply accepts keyword arguments:

 frac = .3 df.groupby('b').apply(pd.DataFrame.sample, frac=.3) ab b 0 6 7 0 1 0 1 1 

If MultiIndex is not required, you can specify group_keys=False for groupby :

 df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, frac=.3) ab 6 7 0 2 3 1 

Example N lines from each group

apply slowly. If your use case is for fetching a fixed number of rows, you can pre-mix the DataFrame and then use GroupBy.head .

 df.sample(frac=1).groupby('b').head(2) ab 2 3 1 5 6 0 1 2 1 4 5 0 

This is the same as df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, n=N) , but faster :

 %%timeit df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, n=2) # 3.19 ms ยฑ 90.5 ยตs %timeit df.sample(frac=1).groupby('b').head(2) # 1.56 ms ยฑ 103 ยตs 
+1
source

Source: https://habr.com/ru/post/1246359/


All Articles