Overlaying actual data on a box from a pandas frame

I use Seaborn to create boxes from pandas data frames. SeabornThe boxes seem to essentially read the data frames in the same way as the functionality pandas boxplot(so I hope the solution is the same for both, but I can just use the function dataframe.boxplot). There are 12 columns in my data frame, and the following code generates one graph with one square for each column (as a function dataframe.boxplot()).

fig, ax = plt.subplots()
sns.set_style("darkgrid", {"axes.facecolor":"darkgrey"})
pal = sns.color_palette("husl",12)
sns.boxplot(dataframe, color = pal)

Can someone suggest an easy way to overlay all values ​​(column by column) when creating a boxplot from dataframes? I would appreciate any help with this.

+4
source share
3 answers

A general solution for boxplot for the entire data frame, which should work both for seabornand for pandas, since they matplotlibare all based on the hood, I will use the graph pandasas an example, assuming it is import matplotlib.pyplot as pltalready in place. Since you already have one ax, it would be better to use ax.text(...)instead plt.text(...).

In [35]:    
print df
         V1        V2        V3        V4        V5
0  0.895739  0.850580  0.307908  0.917853  0.047017
1  0.931968  0.284934  0.335696  0.153758  0.898149
2  0.405657  0.472525  0.958116  0.859716  0.067340
3  0.843003  0.224331  0.301219  0.000170  0.229840
4  0.634489  0.905062  0.857495  0.246697  0.983037
5  0.573692  0.951600  0.023633  0.292816  0.243963

[6 rows x 5 columns]

In [34]:    
df.boxplot()
for x, y, s in zip(np.repeat(np.arange(df.shape[1])+1, df.shape[0]), 
                   df.values.ravel(), df.values.astype('|S5').ravel()):
    plt.text(x,y,s,ha='center',va='center')

enter image description here

For one series in the data frame, several small changes are required:

In [35]:    
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
for x, y, s in zip(np.repeat(1, df.shape[0]), 
                   sub_df.ravel(), sub_df.values.astype('|S5').ravel()):
    plt.text(x,y,s,ha='center',va='center')

enter image description here

Creating scatter fields also looks like:

#for the whole thing
df.boxplot()
plt.scatter(np.repeat(np.arange(df.shape[1])+1, df.shape[0]), df.values.ravel(), marker='+', alpha=0.5)
#for just one column
sub_df=df.V1
pd.DataFrame(sub_df).boxplot()
plt.scatter(np.repeat(1, df.shape[0]), sub_df.ravel(), marker='+', alpha=0.5)

enter image description hereenter image description here

To overlay the material on boxplot, we first need to guess where each square is located among xaxis. They seem to be on par 1,2,3,4,..... Therefore, for the values ​​in the first column, we want them to be a graph at x = 1; 2nd column at x = 2, etc.

- np.repeat, 1,2,3,4..., n , n - . , x. , y , df.ravel()

(). x, y .

+2

seaborn.boxplot, - seaborn.violinplot, :

x = np.random.randn(30, 6)
sns.violinplot(x, inner="points")
sns.despine(trim=True)

enter image description here

+6

:

data = np.random.randn(6,5)

df = pd.DataFrame(data,columns = list('ABCDE'))

Now assign a dummy column to df:
df['Group'] = 'A'

print df

          A         B         C         D         E Group
0  0.590600  0.226287  1.552091 -1.722084  0.459262     A
1  0.369391 -0.037151  0.136172 -0.772484  1.143328     A
2  1.147314 -0.883715 -0.444182 -1.294227  1.503786     A
3 -0.721351  0.358747  0.323395  0.165267 -1.412939     A
4 -1.757362 -0.271141  0.881554  1.229962  2.526487     A
5 -0.006882  1.503691  0.587047  0.142334  0.516781     A

df.groupby.boxplot(), .

df.groupby('Group').boxplot()

Box plot overlay

0

Source: https://habr.com/ru/post/1536270/


All Articles