I am doing an Andrew Ng computer training course, and I am trying to wrap around a vectorized implementation of gradient descent for several variables, which is an additional exercise in the course.
This is the algorithm in question (taken from here ):

I just can't do it in an octave using sum
, although I'm not sure how to multiply the sum of the hypothesis x (i) - y (i) by all the variables xj (i). I tried different iterations of the following code, but to no avail (either the sizes are wrong or the answer is incorrect):
theta = theta - alpha/m * sum(X * theta - y) * X;
The correct answer, however, is completely unobvious (for beginners with linear algebra, like me, from here ):
theta = theta - (alpha/m * (X * theta-y)' * X)';
Is there a rule for cases where a sum
involved that manages the transforms as described above?
And if so, is there an opposite version above (i.e., the transition from a sum
based solution to general multiplication), since I was able to come up with the correct implementation using sum
for gradient descent for a single variable (albeit not very elegant):
temp0 = theta(1) - (alpha/m * sum(X * theta - y)); temp1 = theta(2) - (alpha/m * sum((X * theta - y)' * X(:, 2))); theta(1) = temp0; theta(2) = temp1;
Please note that this only applies to vectorized implementations, and although there are a few questions regarding SO about how this is done, my question is primarily related to the implementation of the algorithm in Octave using sum
.