How does glmnet calculate the maximum lambda value?

Question

How does glmnet calculate the maximum lambda value?

The glmnet package uses the LASSO range of lambda settings, scaled from the maximum lambda_max , in which no predictors are selected. I want to know how glmnet calculates this lambda_max value. For example, in a trivial dataset:

 set.seed(1) library("glmnet") x <- matrix(rnorm(100*20),100,20) y <- rnorm(100) fitGLM <- glmnet(x,y) max(fitGLM$lambda) # 0.1975946

The batch vignette ( http://www.jstatsoft.org/v33/i01/paper ) describes in section 2.5 that it calculates this value as follows:

 sx <- as.matrix(scale(x)) sy <- as.vector(scale(y)) max(abs(colSums(sx*sy)))/100 # 0.1865232

Which is clearly close, but not the same. So what makes this difference? And in a related question, how could I calculate lambda_max for logistic regression?

+6

r lasso glmnet

Sacha epskamp Aug 12 '14 at 6:43

source share

4 answers

According to help("glmnet") maximum lambda value is "the smallest value for which all coefficients are zero":

 sum(fitGLM$beta[, which.max(fitGLM$lambda)]) #[1] 0 sum(glmnet(x,y, lambda=max(fitGLM$lambda)*0.999)$beta) #[1] -0.0001809804

With a quick glance, the value is apparently computed by Fortran code called by elnet .

+1

Rolling Aug 12 '14 at 9:44

source share

For your second question, look at an article by Friedman et al. "Regularization paths for generalized coordinated descent linear models . " In particular, see Equation (10), which is equal to equality in equilibrium. Just check the conditions under which the numerator $ S (\ cdot, \ cdot) $ is zero for all parameters.

+1

Treestump Dec 11 '15 at 0:23

source share

It seems that lambda_max for logistic regression is calculated similarly, with weights based on class proportions:

 set.seed(1) library("glmnet") x <- matrix(rnorm(100*20),100,20) y <- rnorm(100) mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y)) sx <- scale(x, scale=apply(x, 2, mysd)) sx <- as.matrix(sx, ncol=20, nrow=100) y_bin <- factor(ifelse(y<0, -1, 1)) prop.table(table(y_bin)) # y_bin # -1 1 # 0.62 0.38 fitGLM_log <- glmnet(sx, y_bin, family = "binomial") max(fitGLM_log$lambda) # [1] 0.1214006 max(abs(colSums(sx*ifelse(y<0, -.38, .62))))/100 # [1] 0.1214006

+1

Marjolein fokkema Jun 29 '17 at 22:44

source share

Gaby cohen · Accepted Answer · 2014-10-28T19:50:47+0000

To get the same result, you need to standardize the variables using the standard deviation with n instead of the denominator n-1 .

 mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y)) sx <- scale(x,scale=apply(x, 2, mysd)) sx <- as.matrix(sx, ncol=20, nrow=100) sy <- as.vector(scale(y, scale=mysd(y))) max(abs(colSums(sx*sy)))/100 ## [1] 0.1758808 fitGLM <- glmnet(sx,sy) max(fitGLM$lambda) ## [1] 0.1758808

How does glmnet calculate the maximum lambda value?

More articles: