What's wrong with my Gradient Descent algorithm

Hi, I am trying to implement the Gradient Descent algorithm for a function:

enter image description here

My starting point for the algorithm is w = (u, v) = (2,2). The learning speed is eta = 0.01 and bound = 10 ^ -14. Here is my MATLAB code:

function [resultTable, boundIter] = gradientDescent(w, iters, bound, eta) % FUNCTION [resultTable, boundIter] = gradientDescent(w, its, bound, eta) % % DESCRIPTION: % - This function will do gradient descent error minimization for the % function E(u,v) = (u*exp(v) - 2*v*exp(-u))^2. % % INPUTS: % 'w' a 1-by-2 vector indicating initial weights w = [u,v] % 'its' a positive integer indicating the number of gradient descent % iterations % 'bound' a real number indicating an error lower bound % 'eta' a positive real number indicating the learning rate of GD algorithm % % OUTPUTS: % 'resultTable' a iters+1-by-6 table indicating the error, partial % derivatives and weights for each GD iteration % 'boundIter' a positive integer specifying the GD iteration when the error % function got below the given error bound 'bound' % % The error function E = @(u,v) (u*exp(v) - 2*v*exp(-u))^2; % Partial derivative of E with respect to u pEpu = @(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(exp(v) + 2*v*exp(-u)); % Partial derivative of E with respect to v pEpv = @(u,v) 2*(u*exp(v) - 2*v*exp(-u))*(u*exp(v) - 2*exp(-u)); % Initialize boundIter boundIter = 0; % Create a table for holding the results resultTable = zeros(iters+1, 6); % Iteration number resultTable(1, 1) = 0; % Error at iteration i resultTable(1, 2) = E(w(1), w(2)); % The value of pEpu at initial w = (u,v) resultTable(1, 3) = pEpu(w(1), w(2)); % The value of pEpv at initial w = (u,v) resultTable(1, 4) = pEpv(w(1), w(2)); % Initial u resultTable(1, 5) = w(1); % Initial v resultTable(1, 6) = w(2); % Loop all the iterations for i = 2:iters+1 % Save the iteration number resultTable(i, 1) = i-1; % Update the weights temp1 = w(1) - eta*(pEpu(w(1), w(2))); temp2 = w(2) - eta*(pEpv(w(1), w(2))); w(1) = temp1; w(2) = temp2; % Evaluate the error function at new weights resultTable(i, 2) = E(w(1), w(2)); % Evaluate pEpu at the new point resultTable(i, 3) = pEpu(w(1), w(2)); % Evaluate pEpv at the new point resultTable(i, 4) = pEpv(w(1), w(2)); % Save the new weights resultTable(i, 5) = w(1); resultTable(i, 6) = w(2); % If the error function is below a specified bound save this iteration % index if E(w(1), w(2)) < bound boundIter = i-1; end end 

This is an exercise in my machine learning course, but for some reason my results are wrong. There must be something wrong with the code. I tried to debug and debug it and did not find anything bad ... can someone determine what my problem is? ... In other words, can you verify that the code is a valid gradient descent algorithm for this function?

Please let me know if my question is too unclear or if you need more information.

Thanks for your efforts and help! =)

Here are my results for five iterations and other people:

PARAMETERS: w = [2.2], eta = 0.01, bound = 10 ^ -14, iters = 5

enter image description here

+5
source share
3 answers

As the question is discussed below: I would say that others are wrong ... your minimization leads to lower values ​​of E(u,v) , check:

 E(1.4,1.6) = 37.8 >> 3.6 = E(0.63, -1.67) 
+4
source

Not a complete answer, but allows:

I added a snippet to your code so you can see what is happening.

 u1=resultTable(:,5); v1=resultTable(:,6); E1=E(u1,v1); E1(E1<bound)=NaN; [x,y]=meshgrid(-1:0.1:5,-5:0.1:2);Z=E(x,y); surf(x,y,Z) hold on plot3(u1,v1,E1,'r') plot3(u1,v1,E1,'r*') 

enter image description here The result shows that your algorithm is suitable for this function. So, as the other said, either everyone else is wrong, or you are not using the correct equation from the initial.

+4
source

(I apologize for not just commenting, but I'm new to SO and can't comment.)

It looks like your algorithm is doing the right thing. What you want to be sure of is that at every step the energy is reduced (that is). There are several reasons why your data may not coincide with others in the class: they may be incorrect (you or others in the class), they may have started at a different point, they may have used a different step size (what are you calling it , I believe).

Ideally, you do not want to hard code the number of iterations. You want to continue until you reach a local minimum (which, we hope, is a global minimum). To verify this, you want both partial derivatives to be zero (or very close). In addition, to make sure that you are at a local minimum (not a local max or saddle), you should check the E_uu * E_vv - E_uv ^ 2 sign and the E_uu sign to see: http://en.wikipedia.org/wiki/Second_partial_derivative_test for more information (second derivative test, above). If you find yourself in local max. Or a saddle point, your gradient will tell you not to move (since the partial derivatives are 0). Since you know that this is not optimal, you just need to break your decision (sometimes called simulated annealing).

Hope this helps.

+2
source

Source: https://habr.com/ru/post/1205950/


All Articles