How does a subset argument work in the lm () function?

Question

How does a subset argument work in the lm () function?

I tried to figure out how the subset argument works in the R lm() function. Especially for me the following dubious code:

  data(mtcars) summary(lm(mpg ~ wt, data=mtcars)) summary(lm(mpg ~ wt, cyl, data=mtcars))

In each case, the regression has 32 observations

  dim(lm(mpg ~ wt, cyl ,data=mtcars)$model) [1] 32 2 dim(lm(mpg ~ wt ,data=mtcars)$model) [1] 32 2

nevertheless, the coefficients vary (together with R²). Help does not provide too much information on this subject:

a subset of the optional vector defining the subset of observations to be used in the fitting process

+6

r linear-regression lm

Seb Jul 04 '12 at 11:17

source share

1 answer

Ari B. Friedman · Accepted Answer · 2012-07-04T11:23:54+0000

As a general principle, vectors used in a subset can be logical (e.g., TRUE or FALSE for each element) or numeric (e.g., a number). Since the function that helps with the selection, if it is a numeric R, will contain the same element several times if it appears in a subset of the numeric vector.

Let's look at cyl :

 > mtcars$cyl [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

So you get data.frame of the same length, but it consists of line 6, line 6, line 4, line 6, etc.

You can see this if you do a subset yourself:

 > head(mtcars[mtcars$cyl,]) mpg cyl disp hp drat wt qsec vs am gear carb Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant.1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Valiant.2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Valiant.3 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1

Did you mean to do something like this?

 summary(lm(mpg ~ wt, cyl==6, data=mtcars))

How does a subset argument work in the lm () function?

More articles: