R tilde operator: What does ~ 0 + a mean?

I saw how to use the ~ operator in a formula. For example, y~x means: y is distributed as x.

However, I am really confused by what ~0+a means in this code:

 require(limma) a = factor(1:3) model.matrix(~0+a) 

Why just model.matrix(a) not working? Why is the result of model.matrix(~a) different from model.matrix(~0+a) ? And finally, what is the point here of the ~ operator?

+4
source share
3 answers

~ creates a formula - it separates the right and left sides of the formula

From ?`~`

Tilde is used to separate the left and right sides in the model formula.

Quote from the help for the formula

Models customized, for example, by the lm and glm functions, are indicated in compact symbolic form. The ~ operator is fundamental in the formation of such models. The expression of the form y ~ of the model is interpreted as a specification of the fact that the response y is modeled by a linear predictor defined symbolically from the model. Such a model consists of a number of terms separated by + operators. The terms themselves consist of variables and factor names, separated by operators :. Such a term is interpreted as the interaction of all variables and factors that make up a member.

In addition to + and :, a number of other operators are useful in model formulas. The * operator indicates the intersection of factors: a * b, interpreted as a + b + a: b. The ^ operator indicates the intersection to the indicated degree. For example, (a + b + c) ^ 2 is identical to (a + b + c) * (a + b + c), which, in turn, expands to a formula containing the main effects for a, b and c, as well their second interaction. The% in% operator indicates that the terms on their left are nested in those on the right. For example, a + b% in% a expands to the formula a + a: b. The operator - deletes the specified members, so that (a + b + c) ^ 2 - a: b matches a + b + c + b: c + a: c. It can also be used to remove the interception term: when setting the linear model, y ~ x - 1 sets the line through the origin. A model without interception can also be specified as y ~ x + 0 or y ~ 0 + x.

So regarding a specific problem with ~a+0

  • You create a model matrix without interception. Since a is a factor, model.matrix(~a) returns an interception column that is equal to a1 (you need n-1 indicators to fully indicate the n classes)

Help files for each function are well written, detailed and easy to find!

Why model.matrix(a) not working

model.matrix(a) does not work because a is a factor variable, not a formula or term object

Using model.matrix

object is the object of the corresponding class. For the default method, a formula model or term object.

R searches for a specific class of the object, passing the formula ~a , you pass the object of the class formula . model.matrix(terms(~a)) will also work (passing the terms an object matching the formula ~a


general note

@BenBolker gratefully notes in his comment, This is a modified version of the Wilkinson-Rogers notation.

There is a good description in Introduction to R.

+10
source

After reading several guides, I was confused about the value of model.matrix(~0+x) ountil recently that I found this beautiful chapter in a book .

In mathematics, 0+a is equal to a , and a record of type 0+a very strange. However, we are dealing with linear models: a simple high school equation, such as y=ax+b , that reveals the relationship between the predictor variable (x) and the observation (y).

Thus, we can think of ~0+x or equally ~x+0 as the equation of the form: y=ax+b . By adding 0 , we force b to be zero, which means that we are looking for a line passing through the origin (without interception). If we indicated a model such as ~x+1 or simply ~x , then it would be established there that the equation could contain a nonzero term b . Equally, we can restrict b formula ~x-1 or ~-1+x , which both mean: no interception (we exclude the same row or column in R by a negative index). However, something like ~x-2 or ~x+3 does not make sense.

Thanking @mnel for the helpful comment, finally, why use ~ , not = ? In standard mathematical terminology / symbolism, y~x means that y is equivalent to x, it is slightly weaker than y=x . When you set up a linear model, you don't really say y=x , but moreover, you can model y as a linear function of x ( y = ax+b for example)

+5
source

To answer part of your question, the tilde is used to separate the left and right sides of the model formula. See ?"~" For more details.

+2
source

Source: https://habr.com/ru/post/1437916/


All Articles