Frequency and percentage of jagged groups sns barplot

I am trying to show the relative percentage for a group as well as the total frequency in sns barplot. The two groups that I'm comparing are very different in size, so I show the percentage for the groups in the function below.

Here is the syntax for the sample I created, which has similar relative group sizes to my data ("groups") among the target categorical variable ("element"). "rand" is just the variable that I use to create df.

# import pandas and seaborn import pandas as pd import seaborn as sns import numpy as np # create dataframe foobar = pd.DataFrame(np.random.randn(100, 3), columns=('groups', 'item', 'rand')) # get relative groupsizes for row, val in enumerate(foobar.rand) : if val > -1.2 : foobar.loc[row, 'groups'] = 'A' else: foobar.loc[row, 'groups'] = 'B' # assign categories that I am comparing graphically if row < 20: foobar.loc[row, 'item'] = 'Z' elif row < 40: foobar.loc[row, 'item'] = 'Y' elif row < 60: foobar.loc[row, 'item'] = 'X' elif row < 80: foobar.loc[row, 'item'] = 'W' else: foobar.loc[row, 'item'] = 'V' 

Here is a function I wrote that compares relative frequencies across groups. It has some default variables, but I reassigned them for this question.

 def percent_categorical(item, df=IA, grouper='Active Status') : # plot categorical responses to an item ('column name') # by percent by group ('diff column name w categorical data') # select a data frame (default is IA) # 'Active Status' is default grouper # create df of item grouped by status grouped = (df.groupby(grouper)[item] # convert to percentage by group rather than total count .value_counts(normalize=True) # rename column .rename('percentage') # multiple by 100 for easier interpretation .mul(100) # change order from value to name .reset_index() .sort_values(item)) # create plot PercPlot = sns.barplot(x=item, y='percentage', hue=grouper, data=grouped, palette='RdBu' ).set_xticklabels( labels = grouped[item ].value_counts().index.tolist(), rotation=90) #show plot return PercPlot 

The following are the function and the resulting graph:

 percent_categorical('item', df=foobar, grouper='groups') 

the result of the execution of my function

This is good because it allows me to show the relative percentage for the group. However, I also want to display absolute numbers for each group, preferably in a legend. In this case, I would like it to show that there are 89 members of group A and 11 members of group B.

Thanks in advance for any help.

+5
source share
1 answer

I solved this by splitting the groupby operation: one, to get percentages and count the number of objects.

I adjusted your percent_catergorical function as follows:

 def percent_categorical(item, df=IA, grouper='Active Status') : # plot categorical responses to an item ('column name') # by percent by group ('diff column name w categorical data') # select a data frame (default is IA) # 'Active Status' is default grouper # create groupby of item grouped by status groupbase = df.groupby(grouper)[item] # count the number of occurences groupcount = groupbase.count() # convert to percentage by group rather than total count groupper = (groupbase.value_counts(normalize=True) # rename column .rename('percentage') # multiple by 100 for easier interpretation .mul(100) # change order from value to name .reset_index() .sort_values(item)) # create plot fig, ax = plt.subplots() brplt = sns.barplot(x=item, y='percentage', hue=groupper, data=groupper, palette='RdBu', ax=ax).set_xticklabels( labels = grouper[item ].value_counts().index.tolist(), rotation=90) # get the handles and the labels of the legend # these are the bars and the corresponding text in the legend thehandles, thelabels = ax.get_legend_handles_labels() # for each label, add the total number of occurences # you can get this from groupcount as the labels in the figure have # the same name as in the values in column of your df for counter, label in enumerate(thelabels): # the new label looks like this (dummy name and value) # 'XYZ (42)' thelabels[counter] = label + ' ({})'.format(groupcount[label]) # add the new legend to the figure ax.legend(thehandles, thelabels) #show plot return fig, ax, brplt 

To get your number:

 fig, ax, brplt = percent_categorical('item', df=foobar, grouper='groups') 

The resulting graph is as follows:

output

You can change the look of this legend as you wish, I just added parentheses as an example.

+4
source

Source: https://habr.com/ru/post/1269232/


All Articles