Is there a binning error in matplotlib histograms? Or the randomness of the rvs method in scipy.stats

The following code sequentially creates histograms with empty cells, even if the number of samples is large. Empty bins seem to have a regular spacing, but have the same width as other regular baskets. This is obviously wrong - why is this happening? It seems that either the rvs method is not random, or the bin binning procedure is running. In addition, try changing the number of boxes to 50, and another oddity will appear. In this case, it seems that every other bean has a too high score associated with it.

""" An example of how to plot histograms using matplotlib
This example samples from a Poisson distribution, plots the histogram
and overlays the Gaussian with the same mean and standard deviation

"""

from scipy.stats import poisson
from scipy.stats import norm
from matplotlib import pyplot as plt
#import matplotlib.mlab as mlab

EV = 100   # the expected value of the distribution
bins = 100 # number of bins in our histogram
n = 10000
RV = poisson(EV)  # Define a Poisson-distributed random variable

samples = RV.rvs(n)  # create a list of n random variates drawn from that random variable

events, edges, patches = plt.hist(samples, bins, normed = True, histtype = 'stepfilled')  # make a histogram

print events  # When I run this, some bins are empty, even when the number of samples is large

# the pyplot.hist method returns a tuple containing three items. These are events, a list containing
# the counts for each bin, edges, a list containing the values of the lower edge of each bin
# the final element of edges is the value of the high edge of the final bin
# patches, I'm not quite sure about, but we don't need at any rate
# note that we really only need the edges list, but we need to unpack all three elements of the tuple
# for things to work properly, so events and patches here are really just dummy variables

mean = RV.mean()  # If we didn't know these values already, the mean and std methods are convenience
sd = RV.std()     # methods that allow us to retrieve the mean and standard deviation for any random variable

print "Mean is:", mean, " SD is: ", sd

#print edges

Y = norm.pdf(edges, mean, sd)  # this is how to do it with the sciPy version of a normal PDF
# edges is a list, so this will return a list Y with normal pdf values corresponding to each element of edges

binwidth = (len(edges)) / (max(edges) - min(edges))
Y = Y * binwidth
print "Binwidth is:", 1/binwidth
# The above is a fix to "de-normalize" the normal distribution to properly reflect the bin widths

#Q = [edges[i+1] - edges[i] for i in range(len(edges)-1)]
#print Q  # This was to confirm that the bins are equally sized, which seems to be the case.

plt.plot(edges, Y)
plt.show()

enter image description here

+4
source share
1 answer

, ( Poisson RV), , . , , . , .

plt.hist(samples, 
         range=(0,samples.max()),
         bins=samples.max()+1, 
         normed = True, histtype = 'stepfilled')

enter image description here

+6

Source: https://habr.com/ru/post/1524089/


All Articles