Get the percentage of rows (rows) that satisfy a specific condition in the pandas data frame

Question

Get the percentage of rows (rows) that satisfy a specific condition in the pandas data frame

I have this data frame:

df = pd.DataFrame({"A": ["Used", "Not used", "Not used", "Not used", "Used", "Not used", "Used", "Used", "Used", "Not used"], "B": ["Used", "Used", "Used", "Not used", "Not used", "Used", "Not used", "Not used", "Used", "Not used"]})

I would like to find the fastest, cleanest way to find out the following:

The percentage of rows of all rows that used A.
The percentage of rows of all rows that used B.
The percentage of rows of all rows that used A and B.

I am new to Python and pandas (and generally coding), so I'm sure it is very simple, but any recommendations would be appreciated. I tried groupby (). Aggregate (sum), but I did not get the result I needed (I would assume that these are characters, not integers.

+5

python pandas pandas-groupby

Badatcoding Sep 29 '17 at 11:20

source share

2 answers

Using

1) Used A

 In [4929]: 100.*df.A.eq('Used').sum()/df.shape[0] Out[4929]: 50.0

2) Used B

 In [4930]: 100.*df.B.eq('Used').sum()/df.shape[0] Out[4930]: 50.0

3) Used A and used B

 In [4931]: 100.*(df.B.eq('Used') & df.A.eq('Used')).sum()/df.shape[0] Out[4931]: 20.0

1) coincides with

 In [4933]: 100.*(df['A'] == 'Used').sum()/len(df.index) Out[4933]: 50.0

+5

Zero Sep 29 '17 at 11:23

source share

jezrael · Accepted Answer · 2017-09-29T11:24:13+0000

If value_counts with normalize=True used for all percentages, for several groupby columns with size for the lengths of all pairs and divide it by length of df (same as index length):

 print (100 * df['A'].value_counts(normalize=True)) Not used 50.0 Used 50.0 Name: A, dtype: float64 print (100 * df['B'].value_counts(normalize=True)) Not used 50.0 Used 50.0 Name: B, dtype: float64 print (100 * df.groupby(['A','B']).size() / len(df.index)) AB Not used Not used 20.0 Used 30.0 Used Not used 30.0 Used 20.0 dtype: float64

If the necessary filter values create a mask and get mean - True treated as 1 s:

 print (100 * df['A'].eq('Used').mean()) #alternative #print (100 * (df['B'] == 'Used').mean()) 50.0 print (100 * df['B'].eq('Used').mean()) #alternative #print (100 * (df['B'] == 'Used').mean()) 50.0 print (100 * (df['A'].eq('Used') & df['B'].eq('Used')).mean()) 20.0

Get the percentage of rows (rows) that satisfy a specific condition in the pandas data frame

More articles: