Implement gradient descent for multiple variables in Octave using "sum"

Question

Implement gradient descent for multiple variables in Octave using "sum"

I am doing an Andrew Ng computer training course, and I am trying to wrap around a vectorized implementation of gradient descent for several variables, which is an additional exercise in the course.

This is the algorithm in question (taken from here ):

I just can't do it in an octave using sum , although I'm not sure how to multiply the sum of the hypothesis x (i) - y (i) by all the variables xj (i). I tried different iterations of the following code, but to no avail (either the sizes are wrong or the answer is incorrect):

 theta = theta - alpha/m * sum(X * theta - y) * X;

The correct answer, however, is completely unobvious (for beginners with linear algebra, like me, from here ):

 theta = theta - (alpha/m * (X * theta-y)' * X)';

Is there a rule for cases where a sum involved that manages the transforms as described above?

And if so, is there an opposite version above (i.e., the transition from a sum based solution to general multiplication), since I was able to come up with the correct implementation using sum for gradient descent for a single variable (albeit not very elegant):

 temp0 = theta(1) - (alpha/m * sum(X * theta - y)); temp1 = theta(2) - (alpha/m * sum((X * theta - y)' * X(:, 2))); theta(1) = temp0; theta(2) = temp1;

Please note that this only applies to vectorized implementations, and although there are a few questions regarding SO about how this is done, my question is primarily related to the implementation of the algorithm in Octave using sum .

+5

machine-learning octave gradient-descent

Nobilis Mar 11 '16 at 16:38

source share

2 answers

Referring to this part of your question specifically: "I'm not sure how to multiply the sum of hypothesis x (i) - y (i) by all the variables xj (i)."

In Octave, you can multiply xj (i) by all the predictions using ".", So it can be written as:

 m = size(X, 1); predictions = X * theta; sqrErrors = (predictions-y).^2; J = 1 / (2*m) * sum(sqrErrors);

+1

srinivas Oct 21 '16 at 8:13

source share

lejlot · Accepted Answer · 2016-03-11T19:40:19+0000

The general “rule of thumb” is as follows if you come across something in shape

 SUM_i f(x_i, y_i, ...) g(a_i, b_i, ...)

then you can easily vectorize it (and this is what is done above) through

 f(x, y, ...)' * g(a, b, ...)

Since this is just a typical point product, which in mathematics (in a Euclidean space of finite dimension) looks like

 <A, B> = SUM_i A_i B_i = A'B

In this way,

 (X * theta-y)' * X)

just

 <X * theta-y), X> = <H_theta(X) - y, X> = SUM_i (H_theta(X_i) - y_i) X_i

as you can see that this works both ways, as it is just a mathematical definition of a point product.

Implement gradient descent for multiple variables in Octave using "sum"

More articles: