Non-negative matrix factorization does not converge

Question

Non-negative matrix factorization does not converge

I am trying to implement non-negative matrix factorization using the Kulbek-Liebler divergence as a measure of similarity. The algorithm is described in: http://hebb.mit.edu/people/seung/papers/nmfconverge.pdf . Below is my python / numpy implementation with an example matrix to run it.

In a nutshell, the algorithm should study the matrices W (n in r) and H (r in m), for which V (n in m) is approximately WH. You start with random values of W and H, and, following the update rules described in Seung and Lee's article, you should get closer and closer to a good approximation for W and H.

The algorithm proved a monotonous decrease in the measure of divergence, but this is not what happens in my implementation. Instead, it goes into an alternation of two meanings of divergence. If you look at W and H, you will see that the resulting factorization is not particularly good.

I wondered whether to use updated or old H when calculating the update for W. I tried this in both directions and it does not change the implementation behavior.

I tested my implementation against the newspaper several times and I don’t see what I am doing wrong. Can anyone shed some light on the problem?

import numpy as np def update(V, W, H, r, n, m): n,m = V.shape WH = W.dot(H) # equation (5) H_coeff = np.zeros(H.shape) for a in range(r): for mu in range(m): for i in range(n): H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu] H_coeff[a, mu] /= sum(W)[a] H = H * H_coeff W_coeff = np.zeros(W.shape) for i in range(n): for a in range(r): for mu in range(m): W_coeff[i, a] += H[a, mu] * V[i, mu] / WH[i, mu] W_coeff[i, a] /= sum(HT)[a] W = W * W_coeff return W, H def factor(V, r, iterations=100): n, m = V.shape avg_V = sum(sum(V))/n/m W = np.random.random(n*r).reshape(n,r)*avg_V H = np.random.random(r*m).reshape(r,m)*avg_V for i in range(iterations): WH = W.dot(H) divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3) print "At iteration " + str(i) + ", the Kullback-Liebler divergence is", divergence W,H = update(V, W, H, r, n, m) return W, H V = np.arange(0.01,1.01,0.01).reshape(10,10) W, H = factor(V, 6)

+4

python numpy machine-learning

John seales May 15, '13 at 18:13

source share

1 answer

unutbu · Accepted Answer · 2013-05-15T20:38:12+0000

How to eliminate the effect of alternation:

The last line of the proof of Theorem 2 reads:

By canceling the roles of H and W, the update rule for W in the same way as shown, does not grow.

Thus, we can assume that update H can be performed independently of update W This means that after updating H :

 H = H * H_coeff

we must also update the intermediate WH value before updating W :

 WH = W.dot(H) W = W * W_coeff

Both updates reduce discrepancies.

Try it: just attach WH = W.dot(H) before calculating for W_coeff , and the alternation effect will disappear.

Code simplification:

When working with NumPy arrays, use the mean and sum methods and do not use the Python sum function:

 avg_V = sum(sum(V))/n/m

can be written as

 avg_V = V.mean()

and

 divergence = sum(sum(V * np.log(V/WH) - V + WH)) # equation (3)

can be written as

 divergence = ((V * np.log(V_over_WH)) - V + WH).sum()

Avoid the Python built-in sum function, because

it is slower than the NumPy sum method, and
it is not as universal as the NumPy sum method. (This does not allow us to indicate the axis to be summed. We were able to eliminate two calls to Python sum one call to the NumPy sum or mean method.)

Eliminate triple for loop:

But a greater improvement in speed and readability can be achieved by replacing

 H_coeff = np.zeros(H.shape) for a in range(r): for mu in range(m): for i in range(n): H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu] H_coeff[a, mu] /= sum(W)[a] H = H * H_coeff

with

 V_over_WH = V/WH H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T

Explanation:

If you look at the update rule for Equation 5 for H , first notice that the indices for V and (WH) identical. So you can replace V / (WH) with

 V_over_WH = V/WH

Further, we note that in the numerator we sum over the index i, which is the first index in both W and V_over_WH . We can express this as matrix multiplication:

 np.dot(V_over_WH.T, W).T

And the denominator is simply:

 W.sum(axis=0).T

If we separate the numerator and denominator

 (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T

we get a matrix indexed by the two remaining indices, alpha and mu, in that order. This is the same as the indices for H Therefore, we want to multiply H by this ratio in size. Excellent. By default, NumPy by default multiplies arrays by default.

Thus, we can express the whole update rule for H as

 H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T

So, all together:

 import numpy as np np.random.seed(1) def update(V, W, H, WH, V_over_WH): # equation (5) H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T WH = W.dot(H) V_over_WH = V / WH W *= np.dot(V_over_WH, HT) / H.sum(axis=1) WH = W.dot(H) V_over_WH = V / WH return W, H, WH, V_over_WH def factor(V, r, iterations=100): n, m = V.shape avg_V = V.mean() W = np.random.random(n * r).reshape(n, r) * avg_V H = np.random.random(r * m).reshape(r, m) * avg_V WH = W.dot(H) V_over_WH = V / WH for i in range(iterations): W, H, WH, V_over_WH = update(V, W, H, WH, V_over_WH) # equation (3) divergence = ((V * np.log(V_over_WH)) - V + WH).sum() print("At iteration {i}, the Kullback-Liebler divergence is {d}".format( i=i, d=divergence)) return W, H V = np.arange(0.01, 1.01, 0.01).reshape(10, 10) # V = np.arange(1,101).reshape(10,10).astype('float') W, H = factor(V, 6)

Non-negative matrix factorization does not converge

More articles: