What does the "standard formula interface for data.frame" mean in R?

Question

What does the "standard formula interface for data.frame" mean in R?

The documentation for aggregate states:

'aggregate.formula is the standard formula interface for' aggregate.data.frame.

I am new to R and I do not understand what this means. Explain, please!

Thanks!

Uri

+6

r aggregate

Uri laserson Sep 16 '11 at 21:49

source share

1 answer

Dirk eddelbuettel · Accepted Answer · 2011-09-16T22:00:41+0000

Scroll to the middle of the help(aggregate) examples section and you will see the following:

  ## Formulas, one ~ one, one ~ many, many ~ one, and many ~ many: aggregate(weight ~ feed, data = chickwts, mean) aggregate(breaks ~ wool + tension, data = warpbreaks, mean) aggregate(cbind(Ozone, Temp) ~ Month, data = airquality, mean) aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, sum)

Four different calls to aggregate() , all using the formula interface. The way it is written above in what you quote is related to the method submission mechanism used throughout R.

Consider the first example:

 R> class(weight ~ feed) [1] "formula" R> class(chickwts) [1] "data.frame"

therefore, the aggregate sends the first argument (of the formula class) to it. The way that the formula resolves in R usually revolves around model.matrix , I assume something similar happens here and the equivalent call is ultimately executed by aggregate.data.frame , using the second argument to chickwts , a data.frame .

 R> aggregate(weight ~ feed, data = chickwts, mean) feed weight 1 casein 323.583 2 horsebean 160.200 3 linseed 218.750 4 meatmeal 276.909 5 soybean 246.429 6 sunflower 328.917 R>

What you asked is not the easiest question for beginners, I would recommend that you carefully study some documents and a decent R book, if you have one convenient. (And other SO questions give advice on what to read next.)

Edit: I had to dig a little, because aggregate.formula() not exported from the stats namespace, but you can look at it by typing stats:::aggregate.formula in the prompt bar, which clearly shows that this is really so, send it to aggregate.data.frame() :

  [.... some code omitted ...] if (is.matrix(mf[[1L]])) { lhs <- as.data.frame(mf[[1L]]) names(lhs) <- as.character(m[[2L]][[2L]])[-1L] aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) } else aggregate.data.frame(mf[1L], mf[-1L], FUN = FUN, ...) } <environment: namespace:stats> R>

What does the "standard formula interface for data.frame" mean in R?

More articles: