Efficient way to calculate Kullback-Leibler discrepancy in Python

I need to calculate the Kullback-Leibler Divergence (KLD) between thousands of discrete probability vectors. I am currently using the following code, but it is too slow for my purposes. I was wondering if there is a faster way to calculate KL Divergence?

import numpy as np
import scipy.stats as sc

    #n is the number of data points
    kld = np.zeros(n, n)
        for i in range(0, n):
            for j in range(0, n):
                if(i != j):
                    kld[i, j] = sc.entropy(distributions[i, :], distributions[j, :])
+4
source share
1 answer

Scipy stats.entropy 1D, , . broadcasting, .

docs -

scipy.stats.entropy(pk, qk = None, base = None)

pk, S = -sum (pk * log (pk), = 0).

qk None, Kullback-Leibler S = sum (pk * log (pk/qk), = 0).

, , . , (M,M), M - .

, stats.entropy() axis=0, distributions, rowth-dimension, axis=0 - (M,1) (1,M), (M,M) broadcasting.

, -

from scipy import stats
kld = stats.entropy(distributions.T[:,:,None], distributions.T[:,None,:])

-

In [15]: def entropy_loopy(distrib):
    ...:     n = distrib.shape[0] #n is the number of data points
    ...:     kld = np.zeros((n, n))
    ...:     for i in range(0, n):
    ...:         for j in range(0, n):
    ...:             if(i != j):
    ...:                 kld[i, j] = stats.entropy(distrib[i, :], distrib[j, :])
    ...:     return kld
    ...: 

In [16]: distrib = np.random.randint(0,9,(100,100)) # Setup input

In [17]: out = stats.entropy(distrib.T[:,:,None], distrib.T[:,None,:])

In [18]: np.allclose(entropy_loopy(distrib),out) # Verify
Out[18]: True

In [19]: %timeit entropy_loopy(distrib)
1 loops, best of 3: 800 ms per loop

In [20]: %timeit stats.entropy(distrib.T[:,:,None], distrib.T[:,None,:])
10 loops, best of 3: 104 ms per loop
+10

Source: https://habr.com/ru/post/1618024/


All Articles