I am trying to implement a simple logic regression algorithm in Julia. I would like to use the Optim.jl library to minimize my cost, but I cannot get it to work.
My cost function and gradient look like this:
function cost(X, y, theta, lambda) m = length(y) h = sigmoid(X * theta) reg = (lambda / (2*m)) * sum(theta[2:end].^2) J = (1/m) * sum( (-y).*log(h) - (1-y).*log(1-h) ) + reg return J end function grad(X, y, theta, lambda, gradient) m = length(y) h = sigmoid(X * theta)
(where theta is the parameter vector for the hypothesis function, and lambda is the regularization parameter.)
Then, in accordance with the instructions given here: https://github.com/JuliaOpt/Optim.jl I try to call the optimization function as follows:
# those are handle functions I define to pass them as arguments: c(theta::Vector) = cost(X, y, theta, lambda) g!(theta::Vector, gradient::Vector) = grad(X, y, theta, lambda, gradient)
In both cases, he says that he does not converge, and the output looks like awkard:
julia> optimize(c,initial_theta) Results of Optimization Algorithm * Algorithm: Nelder-Mead * Starting Point: [0.0,0.0,0.0,0.0,0.0] * Minimum: [1.7787162051775145,3.4584135105727145,-6.659680628594007,4.776952006060713,1.5034743945407143] * Value of Function at Minimum: -Inf * Iterations: 1000 * Convergence: false * |x - x'| < NaN: false * |f(x) - f(x')| / |f(x)| < 1.0e-08: false * |g(x)| < NaN: false * Exceeded Maximum Number of Iterations: true * Objective Function Calls: 1013 * Gradient Call: 0 julia> optimize(c,g!,initial_theta,method = :l_bfgs) Results of Optimization Algorithm * Algorithm: L-BFGS * Starting Point: [0.0,0.0,0.0,0.0,0.0] * Minimum: [-6.7055e-320,-2.235e-320,-6.7055e-320,-2.244e-320,-6.339759952602652e-7] * Value of Function at Minimum: 0.693148 * Iterations: 1 * Convergence: false * |x - x'| < 1.0e-32: false * |f(x) - f(x')| / |f(x)| < 1.0e-08: false * |g(x)| < 1.0e-08: false * Exceeded Maximum Number of Iterations: false * Objective Function Calls: 75 * Gradient Call: 75
Question
Is my method (from my first code) wrong? Or am I abusing Optim.jl features? In any case, what is the correct way to define and minimize the cost function here?
This is my first time with Julia and I’m probably doing something terribly wrong, but I can’t say what exactly. Any help would be appreciated!
EDIT
X and y are a training set, X is a 90x5 matrix, y is a 90x1 vector (namely, my training set is taken from Iris - I do not think this is important).