Interpretation of scipy.stats.entropy values

Question

Interpretation of scipy.stats.entropy values

I am trying to use scipy.stats.entropy to assess the discrepancy between Kullback-Leibler (KL) between two distributions. In particular, I would like to use KL as a metric to decide how compatible the two distributions are.

However, I cannot interpret the KL values. For example:

t1 = numpy.random.normal (-2.5,0.1,1000)
t2 = numpy.random.normal (-2.5,0.1,1000)
scipy.stats.entropy (t1, t2)
+0.0015539217193737955

Then

t1 = numpy.random.normal (-2.5,0.1,1000)
t2 = numpy.random.normal (2.5,0.1,1000)
scipy.stats.entropy (t1, t2)
= 0.0015908295787942181

How can completely different distributions with essentially no overlap have the same KL value?

t1 = numpy.random.normal (-2.5,0.1,1000)
t2 = numpy.random.normal (25., 0.1.1000)
scipy.stats.entropy (t1, t2)
= 0.00081111364805590595

This gives an even smaller KL value (ie distance), which I would tend to interpret as "more consistent."

Any ideas on how to interpret scipy.stats.entropy (i.e. KL divergence distance) in this context?

+5

python scipy statistics entropy

Scientist Nov 04 '14 at 19:16

source share

1 answer

Brenbarn · Answer 1 · 2014-11-04T19:42:07+0000

numpy.random.normal(-2.5,0.1,1000) is a sample from the normal distribution. These are just 1000 numbers in random order. The documentation for entropy says:

pk[i] is the (possibly unnormalized) probability of event i .

So, to get the nickname for the result, you need the numbers to be “aligned” so that the same indices correspond to the same positions in the distribution. In your example, t1[0] not related to t2[0] . Your example does not contain any direct information about how likely each value is, what you need for the KL divergence; it just gives you some actual values that were taken from the distribution.

The easiest way to get alignment values is to evaluate the probability density function of the distribution for some fixed set of values. To do this, you need to use scipy.stats.norm (which leads to the fact that the distribution object can be controlled in various ways) instead of np.random.normal (which returns only selective values). Here is an example:

 t1 = stats.norm(-2.5, 0.1) t2 = stats.norm(-2.5, 0.1) t3 = stats.norm(-2.4, 0.1) t4 = stats.norm(-2.3, 0.1) # domain to evaluate PDF on x = np.linspace(-5, 5, 100)

Then:

 >>> stats.entropy(t1.pdf(x), t2.pdf(x)) -0.0 >>> stats.entropy(t1.pdf(x), t3.pdf(x)) 0.49999995020647586 >>> stats.entropy(t1.pdf(x), t4.pdf(x)) 1.999999900414918

You can see that as the distributions move farther apart, their divergence KL increases. (In fact, using your second example will give the KL divergence of inf , because they overlap so little.)

Interpretation of scipy.stats.entropy values

More articles: