Apply lm to the subset of the data frame defined by the third column of the frame

Question

Apply lm to the subset of the data frame defined by the third column of the frame

I have a data frame containing a vector of x values, a vector of y values, and an identifier vector:

x <- rep(0:3, 3) y <- runif(12) ID <- c(rep("a", 4), rep("b", 4), rep("c", 4)) df <- data.frame(ID=ID, x=x, y=y)

I would like to create a separate lm for a subset of x and y having the same identifier. The following code does the job:

 a.lm <- lm(x~y, data=subset(df, ID=="a")) b.lm <- lm(x~y, data=subset(df, ID=="b")) c.lm <- lm(x~y, data=subset(df, ID=="c"))

Except that it is very fragile (there may be different identifiers in future data sets) and without vectorization. I would also like to keep all lms in one data structure. There must be an elegant way to do this, but I cannot find it. Any help?

+6

vectorization r dataframe

Drew steen Sep 14 '11 at 10:17

source share

3 answers

Using the base functions, you can split create your original framework and use lapply to do this:

 lapply(split(df,df$ID),function(d) lm(x~y,d)) $a Call: lm(formula = x ~ y, data = d) Coefficients: (Intercept) y -0.2334 2.8813 $b Call: lm(formula = x ~ y, data = d) Coefficients: (Intercept) y 0.7558 1.8279 $c Call: lm(formula = x ~ y, data = d) Coefficients: (Intercept) y 3.451 -7.628

+10

James Sep 14 '11 at 10:43

source share

Use magic in the plyr package. The dlply function takes data.frame , splits it, applies the function to each element, and combines it into a list . This is perfect for your application.

 library(plyr) #fitList <- dlply(df, .(ID), function(dat)lm(x~y, data=dat)) fitList <- dlply(df, .(ID), lm, formula=x~y) # Edit

This creates a list with a model for each subset of IDs:

 str(fitList, max.level=1) List of 3 $ a:List of 12 ..- attr(*, "class")= chr "lm" $ b:List of 12 ..- attr(*, "class")= chr "lm" $ c:List of 12 ..- attr(*, "class")= chr "lm" - attr(*, "split_type")= chr "data.frame" - attr(*, "split_labels")='data.frame': 3 obs. of 1 variable:

This means that you can multiply the list and work with it. For example, to get the coefficients for your lm model, where ID=="a" :

 > coef(fitList$a) (Intercept) y 3.071854 -3.440928

+7

Andrie Sep 14 '11 at 10:28

source share

Ben bolker · Accepted Answer · 2011-09-14T12:17:38+0000

What about

 library(nlme) ## OR library(lme4) lmList(x~y|ID,data=d)

?

Apply lm to the subset of the data frame defined by the third column of the frame

More articles: