Logistic regression in Julia using Optim.jl

I am trying to implement a simple logic regression algorithm in Julia. I would like to use the Optim.jl library to minimize my cost, but I cannot get it to work.

My cost function and gradient look like this:

function cost(X, y, theta, lambda) m = length(y) h = sigmoid(X * theta) reg = (lambda / (2*m)) * sum(theta[2:end].^2) J = (1/m) * sum( (-y).*log(h) - (1-y).*log(1-h) ) + reg return J end function grad(X, y, theta, lambda, gradient) m = length(y) h = sigmoid(X * theta) # gradient = zeros(size(theta)) gradient = (1/m) * X' * (h - y) gradient[2:end] = gradient[2:end] + (lambda/m) * theta[2:end] return gradient end 

(where theta is the parameter vector for the hypothesis function, and lambda is the regularization parameter.)

Then, in accordance with the instructions given here: https://github.com/JuliaOpt/Optim.jl I try to call the optimization function as follows:

 # those are handle functions I define to pass them as arguments: c(theta::Vector) = cost(X, y, theta, lambda) g!(theta::Vector, gradient::Vector) = grad(X, y, theta, lambda, gradient) # then I do optimize(c,some_initial_theta) # or maybe optimize(c,g!,initial_theta,method = :l_bfgs) # try a different algorithm 

In both cases, he says that he does not converge, and the output looks like awkard:

 julia> optimize(c,initial_theta) Results of Optimization Algorithm * Algorithm: Nelder-Mead * Starting Point: [0.0,0.0,0.0,0.0,0.0] * Minimum: [1.7787162051775145,3.4584135105727145,-6.659680628594007,4.776952006060713,1.5034743945407143] * Value of Function at Minimum: -Inf * Iterations: 1000 * Convergence: false * |x - x'| < NaN: false * |f(x) - f(x')| / |f(x)| < 1.0e-08: false * |g(x)| < NaN: false * Exceeded Maximum Number of Iterations: true * Objective Function Calls: 1013 * Gradient Call: 0 julia> optimize(c,g!,initial_theta,method = :l_bfgs) Results of Optimization Algorithm * Algorithm: L-BFGS * Starting Point: [0.0,0.0,0.0,0.0,0.0] * Minimum: [-6.7055e-320,-2.235e-320,-6.7055e-320,-2.244e-320,-6.339759952602652e-7] * Value of Function at Minimum: 0.693148 * Iterations: 1 * Convergence: false * |x - x'| < 1.0e-32: false * |f(x) - f(x')| / |f(x)| < 1.0e-08: false * |g(x)| < 1.0e-08: false * Exceeded Maximum Number of Iterations: false * Objective Function Calls: 75 * Gradient Call: 75 

Question

Is my method (from my first code) wrong? Or am I abusing Optim.jl features? In any case, what is the correct way to define and minimize the cost function here?

This is my first time with Julia and I’m probably doing something terribly wrong, but I can’t say what exactly. Any help would be appreciated!

EDIT

X and y are a training set, X is a 90x5 matrix, y is a 90x1 vector (namely, my training set is taken from Iris - I do not think this is important).

+5
source share
2 answers

Below you will find the functions of calculating the cost and gradient for logistic regression using closures and currying (the version for those who are used to the function that returns the value and gradient):

 function cost_gradient(θ, X, y, λ) m = length(y) return (θ::Array) -> begin h = sigmoid(X * θ) J = (1 / m) * sum(-y .* log(h) .- (1 - y) .* log(1 - h)) + λ / (2 * m) * sum(θ[2:end] .^ 2) end, (θ::Array, storage::Array) -> begin h = sigmoid(X * θ) storage[:] = (1 / m) * (X' * (h .- y)) + (λ / m) * [0; θ[2:end]] end end 

Implementation of the sigmoid function:

 sigmoid(z) = 1.0 ./ (1.0 + exp(-z)) 

To apply cost_gradient in Optim.jl, follow these steps:

 using Optim #... # Prerequisites: # X size is (m,d), where d is the number of training set features # y size is (m,1) # λ as the regularization parameter, eg 1.5 # ITERATIONS number of iterations, eg 1000 X=[ones(size(X,1)) X] #add x_0=1.0 column; now X size is (m,d+1) initialθ = zeros(size(X,2),1) #initialTheta size is (d+1, 1) cost, gradient! = cost_gradient(initialθ, X, y, λ) res = optimize(cost, gradient!, initialθ, method = ConjugateGradient(), iterations = ITERATIONS); θ = Optim.minimizer(res); 

Now you can easily predict (for example, testing a test suite):

 predictions = sigmoid(X * θ) #X size is (m,d+1) 

Try my approach or compare it with your implementation.

+4
source

Here is an example of an irregular logistic regression that uses the Optim.jl auto-differentiation function. This can help you with your own implementation.

 using Optim const X = rand(100, 3) const true_β = [5,2,4] const true_y = 1 ./ (1 + exp(-X*true_β)) function objective(β) y = 1 ./ (1 + exp(-X*β)) return sum((y - true_y).^2) # Use SSE, non-standard for log. reg. end println(optimize(objective, [3.0,3.0,3.0], autodiff=true, method=LBFGS())) 

What gives me

 Results of Optimization Algorithm * Algorithm: L-BFGS * Starting Point: [3.0,3.0,3.0] * Minimizer: [4.999999945789497,1.9999999853962256,4.0000000047769495] * Minimum: 0.000000 * Iterations: 14 * Convergence: true * |x - x'| < 1.0e-32: false * |f(x) - f(x')| / |f(x)| < 1.0e-08: false * |g(x)| < 1.0e-08: true * Exceeded Maximum Number of Iterations: false * Objective Function Calls: 53 * Gradient Call: 53 
+4
source

Source: https://habr.com/ru/post/1231959/


All Articles