How to run a rob model in Stan?

I would like to launch a reliable logistic regression (robit) in Stan. The model is proposed in Gelman and Hill, “Data Analysis Using Regression and Multilevel Methods” (2006, pp. 124), but I'm not sure how to implement it. I checked the Stan Github repository and the reference guide , but unfortunately I am still confused. Here is some code that I used to model regular logistic regression. What should I add to this so that errors are accompanied, say, by a distribution of t with 7 degrees of freedom? Perhaps there will be the same procedure if I run a multi-level analysis?

library(rstan) set.seed(1) x1 <- rnorm(100) x2 <- rnorm(100) z <- 1 + 2*x1 + 3*x2 pr <- 1/(1+exp(-z)) y <- rbinom(100,1,pr) df <- list(N=100, y=y,x1=x1,x2=x2) # Stan code model1 <- ' data { int<lower=0> N; int<lower=0,upper=1> y[N]; vector[N] x1; vector[N] x2; } parameters { real beta_0; real beta_1; real beta_2; } model { y ~ bernoulli_logit(beta_0 + beta_1 * x1 + beta_2 * x2); } ' # Run the model fit <- stan(model_code = model1, data = df, iter = 1000, chains = 4) print(fit) 

Thanks!

+6
source share
4 answers

I am missing something, but I was not able to adapt the solution that danilofreire sent from Luke. So I just translated the model from JAGS.

I think this is correct, despite the fact that it is slightly different from the Luc solution.

 library(rstan) N <- 100 x1 <- rnorm(N) x2 <- rnorm(N) beta0 <- 1 beta1 <- 2 beta2 <- 3 eta <- beta0 + beta1*x1 + beta2*x2 # linear predictor p <- 1/(1 + exp(-eta)) # inv-logit y <- rbinom(N, 1, p) dlist <- list(y = y, x1 = x1, x2 = x2, N = N, nu = 3) # adjust nu as desired df mod_string <- " data{ int<lower=0> N; vector[N] x1; vector[N] x2; int<lower=0, upper=1> y[N]; real nu; } parameters{ real beta0; real beta1; real beta2; } model{ vector[N] pi; for(i in 1:N){ pi[i] <- student_t_cdf(beta0 + beta1*x1[i] + beta2*x2[i], nu, 0, 1); y[i] ~ bernoulli(pi[i]); } } " fit1 <- stan(model_code = mod_string, data = dlist, chains = 3, iter = 1000) print(fit1) 
+5
source

Luke Coffeng sent me this answer to the Stan mailing list , and I thought I should add it here. He said:

"Take GLM as the basis for your regression with robit: just replace the standard error term with e ~ student_t(7, 0, sigma_e) , where sigma_e ~ cauchy(0, 2) or whatever scale you think will be in order (I wouldn’t go beyond 5 anyway as a reverse log (-5.5) covers most of the interval [0.1]. In addition to the t-error scale, you can also specify df t-error as a parameter See below for suggested code.

However, I hope that your data contains more information than the toy example you provided, i.e. several observations per person (as shown below). With just one observation per person / unit, the model is almost impossible to identify.

He then presented the following example:

 library(rstan) set.seed(1) x1 <- rnorm(100) x2 <- rnorm(100) z <- 1 + 2*x1 + 3*x2 + 0.1 * rt(100, 7) pr <- 1/(1+exp(-z)) y <- rbinom(100,10,pr) df <- list(N=100, y=y, x1=x1, x2=x2, nu = 7) # Stan code model1 <- ' data { int<lower=0> N; int<lower=0,upper=10> y[N]; vector[N] x1; vector[N] x2; real nu; } parameters { real beta_0; real beta_1; real beta_2; real<lower=0> sigma_e; vector[N] e; } model { e ~ student_t(nu, 0, sigma_e); sigma_e ~ cauchy(0, 1); y ~ binomial_logit(10, beta_0 + beta_1 * x1 + beta_2 * x2 + e); } ' # Run the model fit <- stan(model_code = model1, data = df, iter = 4000, chains = 2) print(fit) 

Bob Carpenter also briefly commented on the question:

"[...] And yes, you can do the same in a hierarchical setting, but you have to be careful because modeling degrees of freedom can be difficult if the scale approaches infinity when you approach normality."

Asked by Bernd, Luke explained why he wrote y ~ bernoulli_logit(10... in the model code:

“In the code example below, 10 is the size of the sample. You may have noticed that these toys contain multiple observations per element / block (i.e. 10 observations per unit).

The Stan manual also provides extensive information about function arguments and sample statements.

+4
source

Update: My translation of the johnmyleswhite example for Stan Synthax does not work. I don’t understand that Stan Synthax translates the code. Maybe someone can help? Below is the original answer.

If you check the johnmyleswhite example mentioned by jbaums, you will see that an important piece of code:

 y[i] ~ dbern(p[i]) p[i] <- pt(z[i], 0, 1, 1) z[i] <- a * x[i] + b 

As you can see, insted using invlogit to calculate probabilities, it uses the distribution t (in fact, cumulative t). In stan just use:

 student_t_cdf 

I don't know how Stan's syntax is good, but I assume you can use something like the following:

  model { y ~ bernoulli(theta); theta <- student_t_cdf(df, mu, sigma) mu <- beta_0 + beta_1 * x1 + beta_2 * x2; } 

Note that you will need to prioritize df and sigma. Something like :

 df_inv ~ uniform(0, 0.5); df <- 1 / df_inv; sigma_z <- sqrt((df-2)/df); 

I will try to see if this works. Let me know if I adjust my answer a bit so that it works.

+1
source

Page 26 of the Stan 2.4 Reference Manual:

 y ~ bernoulli(Phi( beta_0 + beta_1 * x1 + beta_2 * x2 )) 

The general solution is y ~ bernoulli(link_function(eta)) , where link_function is, for example, Phi . There's just a special bernoulli_logit function that wraps this functionality and is numerically more stable.

I recommend reading generalized linear models if the reason for this is not clear. The Wikipedia page is not such a bad review.

+1
source

Source: https://habr.com/ru/post/976925/


All Articles