Computing mutual information in python returns nan

Question

Computing mutual information in python returns nan

I implemented a mutual information formula in python using pandasandnumpy

def mutual_info(p):
    p_x=p.sum(axis=1)
    p_y=p.sum(axis=0)
    I=0.0
    for i_y in p.index:
        for i_x in p.columns:
           I+=(p.ix[i_y,i_x]*np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x]))).values[0]
    return I

However, if the cell in phas zero probability, then it np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x]))is negative infinity, and the whole expression is multiplied by zero and returns NaN.

What is the right way to get around this?

+4

python pandas nan information-theory

Alejandro Rodriguez Jan 16 '16 at 20:37

source share

1 answer

Ami Tavory · Accepted Answer · 2016-01-16T20:52:36+0000

For various theoretical and practical reasons (for example, see Evaluation of Competitive Distribution: Why Good-Turing Good ), you can never use zero probability with a measure of log loss.

, , p, & alpha; > 0, & alpha; 1 + (1 - & alpha;) p ( 1 ). , & alpha;, .

Kullback-Leibler , , .

Computing mutual information in python returns nan

More articles: