Cv.glmnet does not work for ridge, not for lasso, for modeling data with encoder error

Question

Cv.glmnet does not work for ridge, not for lasso, for modeling data with encoder error

Gist

Error: Error in predmat[which, seq(nlami)] = preds : replacement has length zero

Context: data is modeled with binary y, but there are n true y encoders. data is added n times, and the model is installed, trying to get true y .

Error received for

L2 fine, but not L1 fine.
when Y is the encoder of Y, but not when it is true Y.
the error is not determined, but depends on the seed.

UPDATE: error for versions after 1.9-8. 1.9.8 is not interrupted.

Play

basic data:

 library(glmnet) rm(list=ls()) set.seed(123) num_obs=4000 n_coders=2 precision=.8 X <- matrix(rnorm(num_obs*20, sd=1), nrow=num_obs) prob1 <- plogis(X %*% c(2, -2, 1, -1, rep(0, 16))) # yes many zeros, ignore y_true <- rbinom(num_obs, 1, prob1) dat <- data.frame(y_true = y_true, X = X)

create encoders

 classify <- function(true_y,precision){ n=length(true_y) y_coder <- numeric(n) y_coder[which(true_y==1)] <- rbinom(n=length(which(true_y==1)), size=1,prob=precision) y_coder[which(true_y==0)] <- rbinom(n=length(which(true_y==0)), size=1,prob=(1-precision)) return(y_coder) } y_codings <- sapply(rep(precision,n_coders),classify,true_y = dat$y_true)

collect everything

 expanded_data <- do.call(rbind,rep(list(dat),n_coders)) expanded_data$y_codings <- matrix(y_codings, ncol = 1)

reproduce the error

Since the error depends on the seed, a loop is needed. only the first cycle will fail, the other two will be completed.

 X <- as.matrix(expanded_data[,grep("X",names(expanded_data))]) for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings, family="binomial", alpha=0) # will fail for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_codings, family="binomial", alpha=1) # will not fail for (i in 1:1000) cv.glmnet(x = X,y = expanded_data$y_true, family="binomial", alpha=0) # will not fail

Any thoughts where this comes from in glmnet and how to avoid it? from my reading of cv.glmnet , this is after the cv procedure and is inside cvstuff = do.call(fun, list(outlist, lambda, x, y, weights, offset, foldid, type.measure, grouped, keep)) , which I do not understand its role, therefore, failure and how to avoid it.

Sessions (Ubuntu and PC)

 R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0 loaded via a namespace (and not attached): [1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1 [6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 knitr_1.14 [11] digest_0.6.10 lattice_0.20-34

and

 R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] glmnet_2.0-2 foreach_1.4.3 Matrix_1.2-7.1 devtools_1.12.0 loaded via a namespace (and not attached): [1] httr_1.2.1 R6_2.2.0 tools_3.3.1 withr_1.0.2 curl_2.1 [6] memoise_1.0.0 codetools_0.2-15 grid_3.3.1 iterators_1.0.8 digest_0.6.10 [11] lattice_0.20-34

+6

r glmnet

Elad663 Oct 20 '16 at 3:57

source share

2 answers

user2173836 · Answer 1 · 2017-01-27T13:45:44+0000

I had the same error in glmnet_2.0-5 This is somehow related to the way lambdas are automatically created in some situations. The solution is to provide your own lambdas

eg:

 cv.glmnet(x = X, y = expanded_data$y_codings, family="binomial", alpha=0, lambda=exp(seq(log(0.001), log(5), length.out=100)))

Kudos https://github.com/lmweber/glmnet-error-example/blob/master/glmnet_error_example.R

Hong ooi · Answer 2 · 2016-10-20T13:45:01+0000

Well, I just started the first cycle and it completed successfully. This is with glmnet 2.0.2.

This is more of a comment, but too big to fit: when you run tests that depend on random numbers, you can save the seed as you go. This allows you to go to the middle of testing without returning to launch every time.

Something like that:

 results <- lapply(1:1000, function(x) { seed <- .Random.seed res <- try(glmnet(x, y, ...)) # so the code keeps running even if there an error attr(res, "seed") <- seed res })

Now you can check if any of the runs passed by looking at the result class:

 errs <- sapply(results, function(x) inherits(x, "try-error")) any(errs)

And you can repeat those runs that failed:

 firstErr <- which(errs)[1] .Random.seed <- attr(results[[firstErr]], "seed") glmnet(x, y, ...) # try failed run again

Session Information:

 R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 8 x64 (build 9200) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.850 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] glmnetUtils_0.55 RevoUtilsMath_8.0.3 RevoUtils_8.0.3 RevoMods_8.0.3 RevoScaleR_8.0.6 [6] lattice_0.20-33 rpart_4.1-10 loaded via a namespace (and not attached): [1] Matrix_1.2-2 parallel_3.2.2 codetools_0.2-14 rtvs_1.0.0.0 grid_3.2.2 [6] iterators_1.0.8 foreach_1.4.3 glmnet_2.0-2

(It should be Windows 10, not 8; R 3.2.2 does not know about Win10)

Cv.glmnet does not work for ridge, not for lasso, for modeling data with encoder error

Gist

Play

basic data:

create encoders

collect everything

reproduce the error

More articles: