Linear regression with coefficient constraints

Question

Linear regression with coefficient constraints

I am trying to do linear regression for a model like this:

Y = aX1 + bX2 + c

So Y ~ X1 + X2

Suppose I have the following response vector:

 set.seed(1) Y <- runif(100, -1.0, 1.0)

And the following matrix of predictors:

 X1 <- runif(100, 0.4, 1.0) X2 <- sample(rep(0:1,each=50)) X <- cbind(X1, X2)

I want to use the following restrictions on the odds:

 a + c >= 0 c >= 0

So there are no restrictions on b.

I know that the glmc package can be used to apply restrictions, but I could not determine how to apply it to my restrictions. I also know that contr.sum can be used so that all coefficients add up to 0, for example, but that’s not what I want to do. solve.QP () seems like another possibility when the meq=0 parameter can be used so that all the coefficients are> = 0 (again, not my goal here).

Note. The solution should be able to process NA values in the response vector Y, for example:

 Y <- runif(100, -1.0, 1.0) Y[c(2,5,17,56,37,56,34,78)] <- NA

+5

r linear-regression quadratic-programming

arielle Aug 08 '17 at 20:35

source share

1 answer

josliber · Accepted Answer · 2017-08-08T21:57:15+0000

solve.QP can pass arbitrary linear constraints, so it can, of course, be used to model the constraints a+c >= 0 and c >= 0 .

First, we can add a column from 1 to X to capture the interception term, and then we can replicate the standard linear regression using solve.QP :

 X2 <- cbind(X, 1) library(quadprog) solve.QP(t(X2) %*% X2, t(Y) %*% X2, matrix(0, 3, 0), c())$solution # [1] 0.08614041 0.21433372 -0.13267403

With sample data from a question, no constraint is satisfied using standard linear regression.

bvec modifying the Amat and bvec , we can add our two limitations:

 solve.QP(t(X2) %*% X2, t(Y) %*% X2, cbind(c(1, 0, 1), c(0, 0, 1)), c(0, 0))$solution # [1] 0.0000000 0.1422207 0.0000000

In accordance with these restrictions, square residuals are minimized by setting the coefficients a and c equal to 0.

You can handle missing values in Y or X2 in the same way that the lm function does, by removing offensive observations. As a preprocessing step, you can do something like the following:

 has.missing <- rowSums(is.na(cbind(Y, X2))) > 0 Y <- Y[!has.missing] X2 <- X2[!has.missing,]

Linear regression with coefficient constraints

More articles: