Strange behavior of boxplot matplotlibs when using a notch shape

I encounter some weird behavior in matplotlib boxplot when I use the " notch " form. I am using some code that I wrote some time ago and I never had these problems. I wonder what the problem is. Any ideas?

weird behavior on notched boxplots

When I rotate the shape of the cutout, it looks fine, although

unnotched boxplots look normal

This will be the code:

 def boxplot_modified(data): fig = plt.figure(figsize=(8,6)) ax = plt.subplot(111) bplot = plt.boxplot(data, #notch=True, # notch shape vert=True, # vertical box aligmnent sym='ko', # red circle for outliers patch_artist=True, # fill with color ) # choosing custom colors to fill the boxes colors = 3*['lightgreen'] + 3*['lightblue'], 'lightblue', 'lightblue', 'lightblue'] for patch, color in zip(bplot['boxes'], colors): patch.set_facecolor(color) # modifying the whiskers: straight lines, black, wider for whisker in bplot['whiskers']: whisker.set(color='black', linewidth=1.2, linestyle='-') # making the caps a little bit wider for cap in bplot['caps']: cap.set(linewidth=1.2) # hiding axis ticks plt.tick_params(axis="both", which="both", bottom="off", top="off", labelbottom="on", left="off", right="off", labelleft="on") # adding horizontal grid lines ax.yaxis.grid(True) # remove axis spines ax.spines["top"].set_visible(False) ax.spines["right"].set_visible(False) ax.spines["bottom"].set_visible(True) ax.spines["left"].set_visible(True) plt.xticks([y+1 for y in range(len(data))], 8*['x']) # raised title #plt.text(2, 1, 'Modified', # horizontalalignment='center', # fontsize=18) plt.tight_layout() plt.show() boxplot_modified(df.values) 

and when I make a simple graph without tuning, the problem still arises:

 def boxplot(data): fig = plt.figure(figsize=(8,6)) ax = plt.subplot(111) bplot = plt.boxplot(data, notch=True, # notch shape vert=True, # vertical box aligmnent sym='ko', # red circle for outliers patch_artist=True, # fill with color ) plt.show() boxplot(df.values) 

notch plot without customization still looks weird

+6
source share
1 answer

Well, as it turned out, this is actually the correct behavior;)

From Wikipedia :

The inscriptions on the square mark use a "notch" or narrowing the frame around the median. Recesses are useful in providing an approximate guide to the meaning of the difference in medians; if the cutouts from two cells do not overlap, this indicates a statistically significant difference between the medians. The width of the cuts is proportional to the interquartile range of the sample and inversely proportional to the square root of the size of the sample. However, there is uncertainty about the most suitable factor (since this may vary depending on the similarity of the dispersions of the samples). One convention is to use +/- 1.58 * IQR / sqrt (n).

This was also discussed in a problem on GitHub ; R gives a similar result as evidence that this behavior is "correct."

Thus, if we have this strange “inverted” view in notched graphs, it simply means that the 1st quartile is less important than the confidence of the average and vice versa for the 3rd quartile. Although this looks ugly, this is really useful information about the (un) confidence of the median.

Self-tuning (random sampling with replacement of the parameters for estimating the distribution of the sample, here: confidence intervals) can reduce this effect:

From the plt.boxplot documentation:

bootstrap: None (default) or integer Specifies whether to load confidence intervals around the median for notches. If bootstrap == None, loading is not performed, and cutouts are calculated using the Gaussian-based asymptotic approximation (see McGill, R., Tukey, JW and Larsen, WA, 1978, Kendall and Stuart, 1967). Otherwise, bootstrap indicates the number of times to load the median to determine its 95% confidence intervals. Values ​​between 1000 and 10000 are recommended.

+6
source

Source: https://habr.com/ru/post/976541/


All Articles