How to eliminate the effect of alternation:
The last line of the proof of Theorem 2 reads:
By canceling the roles of H and W, the update rule for W in the same way as shown, does not grow.
Thus, we can assume that update H can be performed independently of update W This means that after updating H :
H = H * H_coeff
we must also update the intermediate WH value before updating W :
WH = W.dot(H) W = W * W_coeff
Both updates reduce discrepancies.
Try it: just attach WH = W.dot(H) before calculating for W_coeff , and the alternation effect will disappear.
Code simplification:
When working with NumPy arrays, use the mean and sum methods and do not use the Python sum function:
avg_V = sum(sum(V))/n/m
can be written as
avg_V = V.mean()
and
divergence = sum(sum(V * np.log(V/WH) - V + WH))
can be written as
divergence = ((V * np.log(V_over_WH)) - V + WH).sum()
Avoid the Python built-in sum function, because
- it is slower than the NumPy
sum method, and - it is not as universal as the NumPy
sum method. (This does not allow us to indicate the axis to be summed. We were able to eliminate two calls to Python sum one call to the NumPy sum or mean method.)
Eliminate triple for loop:
But a greater improvement in speed and readability can be achieved by replacing
H_coeff = np.zeros(H.shape) for a in range(r): for mu in range(m): for i in range(n): H_coeff[a, mu] += W[i, a] * V[i, mu] / WH[i, mu] H_coeff[a, mu] /= sum(W)[a] H = H * H_coeff
with
V_over_WH = V/WH H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
Explanation:
If you look at the update rule for Equation 5 for H , first notice that the indices for V and (WH) identical. So you can replace V / (WH) with
V_over_WH = V/WH
Further, we note that in the numerator we sum over the index i, which is the first index in both W and V_over_WH . We can express this as matrix multiplication:
np.dot(V_over_WH.T, W).T
And the denominator is simply:
W.sum(axis=0).T
If we separate the numerator and denominator
(np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
we get a matrix indexed by the two remaining indices, alpha and mu, in that order. This is the same as the indices for H Therefore, we want to multiply H by this ratio in size. Excellent. By default, NumPy by default multiplies arrays by default.
Thus, we can express the whole update rule for H as
H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T
So, all together:
import numpy as np np.random.seed(1) def update(V, W, H, WH, V_over_WH): # equation (5) H *= (np.dot(V_over_WH.T, W) / W.sum(axis=0)).T WH = W.dot(H) V_over_WH = V / WH W *= np.dot(V_over_WH, HT) / H.sum(axis=1) WH = W.dot(H) V_over_WH = V / WH return W, H, WH, V_over_WH def factor(V, r, iterations=100): n, m = V.shape avg_V = V.mean() W = np.random.random(n * r).reshape(n, r) * avg_V H = np.random.random(r * m).reshape(r, m) * avg_V WH = W.dot(H) V_over_WH = V / WH for i in range(iterations): W, H, WH, V_over_WH = update(V, W, H, WH, V_over_WH) # equation (3) divergence = ((V * np.log(V_over_WH)) - V + WH).sum() print("At iteration {i}, the Kullback-Liebler divergence is {d}".format( i=i, d=divergence)) return W, H V = np.arange(0.01, 1.01, 0.01).reshape(10, 10) # V = np.arange(1,101).reshape(10,10).astype('float') W, H = factor(V, 6)