How to build probability mass function in python

How can I create a histogram that shows the probability distribution given by an array of numbers x from 0-1? I expect each bar to be <= 1 and that if I sum the y values ​​of each bar, they should add up to 1.

For example, if x = [.2, .2, .8], then I expect a graph showing 2 bars, one at .2 with a height of .66, one at 0.8 with a height of .33.

I tried:

matplotlib.pyplot.hist(x, bins=50, normed=True) 

which gives me a histogram with bars that go above 1. I am not saying what is wrong, as this is what the normal parameter will do according to the documentation, but it does not show the probability.

I also tried:

 counts, bins = numpy.histogram(x, bins=50, density=True) bins = bins[:-1] + (bins[1] - bins[0])/2 matplotlib.pyplot.bar(bins, counts, 1.0/50) 

which also gives bars whose y values ​​add up to more than 1.

+7
source share
2 answers

I think my original terminology has been disabled. I have an array of continuous values ​​[0-1], which I want to discretize and use to build the mass probability probability function. I thought this could be common enough to guarantee the only way to do this.

Here is the code:

 x = [random.random() for r in xrange(1000)] num_bins = 50 counts, bins = np.histogram(x, bins=num_bins) bins = bins[:-1] + (bins[1] - bins[0])/2 probs = counts/float(counts.sum()) print probs.sum() # 1.0 plt.bar(bins, probs, 1.0/num_bins) plt.show() 
+6
source

I think you take the amount as an integral. The correct PDF (probability distribution function) is integrated into a unit; if you just take the amount, you may miss out on the size of the rectangular box.

 import numpy as np import pylab as plt N = 10**5 X = np.random.normal(size=N) counts, bins = np.histogram(X,bins=50, density=True) bins = bins[:-1] + (bins[1] - bins[0])/2 print np.trapz(counts, bins) 

Gives .999985 , which is close enough to unity.

UPDATE: In response to the comment below:

If x = [. 2, .2, .8] and I'm looking for a graph with two columns, one at .2 with a height of .66, because 66% of the values ​​are at .2 and one column at 0.8 with a height of 0.33, as this name is called chart and how to generate it?

The following code:

 from collections import Counter x = [.2,.2,.8] C = Counter(x) total = float(sum(C.values())) for key in C: C[key] /= total 

Gives a "dictionary" C=Counter({0.2: 0.666666, 0.8: 0.333333}) . From here you can build a histogram, but it will only work if the PDF is discrete and accepts only a finite fixed set of values ​​that are well separated from each other.

+3
source

Source: https://habr.com/ru/post/956445/


All Articles