Entropy using a histogram of normal data using the direct formula (matlab)

Suppose we draw n=10000 samples of the standard normal distribution.

Now I want to calculate its entropy using histograms to calculate probabilities.

1) calculate probabilities (e.g. using matlab)

 [p,x] = hist(samples,binnumbers); area = (x(2)-x(1))*sum(p); p = p/area; 

(binary values ​​are determined due to some rule)

2) estimation of entropy

 H = -sum(p.*log2(p)) 

which gives 58.6488

Now that I use the direct formula to calculate the entropy of normal data

 H = 0.5*log2(2*pi*exp(1)) = 2.0471 

What am I doing wrong when using histograms + entropy formulas? Thanks so much for any help!

+4
source share
1 answer

You are missing a dp member in total

 dp = (x(2)-x(1)); area = sum(p)*dp; H = -sum( (p*dp) * log2(p) ); 

That should bring you close enough ...

PS
be careful when you take log2(p) , because sometimes you may have empty baskets. You can find nansum .

+3
source

Source: https://habr.com/ru/post/1480546/


All Articles