Get the percentage of rows (rows) that satisfy a specific condition in the pandas data frame

I have this data frame:

df = pd.DataFrame({"A": ["Used", "Not used", "Not used", "Not used", "Used", "Not used", "Used", "Used", "Used", "Not used"], "B": ["Used", "Used", "Used", "Not used", "Not used", "Used", "Not used", "Not used", "Used", "Not used"]}) 

I would like to find the fastest, cleanest way to find out the following:

  • The percentage of rows of all rows that used A.
  • The percentage of rows of all rows that used B.
  • The percentage of rows of all rows that used A and B.

I am new to Python and pandas (and generally coding), so I'm sure it is very simple, but any recommendations would be appreciated. I tried groupby (). Aggregate (sum), but I did not get the result I needed (I would assume that these are characters, not integers.

+5
source share
2 answers

If value_counts with normalize=True used for all percentages, for several groupby columns with size for the lengths of all pairs and divide it by length of df (same as index length):

 print (100 * df['A'].value_counts(normalize=True)) Not used 50.0 Used 50.0 Name: A, dtype: float64 print (100 * df['B'].value_counts(normalize=True)) Not used 50.0 Used 50.0 Name: B, dtype: float64 print (100 * df.groupby(['A','B']).size() / len(df.index)) AB Not used Not used 20.0 Used 30.0 Used Not used 30.0 Used 20.0 dtype: float64 

If the necessary filter values โ€‹โ€‹create a mask and get mean - True treated as 1 s:

 print (100 * df['A'].eq('Used').mean()) #alternative #print (100 * (df['B'] == 'Used').mean()) 50.0 print (100 * df['B'].eq('Used').mean()) #alternative #print (100 * (df['B'] == 'Used').mean()) 50.0 print (100 * (df['A'].eq('Used') & df['B'].eq('Used')).mean()) 20.0 
+8
source

Using

1) Used A

 In [4929]: 100.*df.A.eq('Used').sum()/df.shape[0] Out[4929]: 50.0 

2) Used B

 In [4930]: 100.*df.B.eq('Used').sum()/df.shape[0] Out[4930]: 50.0 

3) Used A and used B

 In [4931]: 100.*(df.B.eq('Used') & df.A.eq('Used')).sum()/df.shape[0] Out[4931]: 20.0 

1) coincides with

 In [4933]: 100.*(df['A'] == 'Used').sum()/len(df.index) Out[4933]: 50.0 
+5
source

Source: https://habr.com/ru/post/1272218/


All Articles