R: Anova function for different subsets of the same data set and data collection

The general task is to perform a certain statistical analysis (for example, anova, glm or a mixed model) on different subsets of the data set and combine the output tables with the total coefficients and p values ​​in one data frame. I look at least for a general function that will take on the type of model (for example, aov(...)either lm(...)or glm(...)or glmer(...)) and specific output terms for which coefficients and p values ​​should be returned for each of the replication analyzes in accordance with some variables (groups) of the grouping in one dataset.

Say, if I have a data framework in which I would like to conduct a certain analysis at different levels of the “replicate” factor in the data frame data:

data(iris)
library(car)
data=data.frame()
for (i in 1:10) {data=rbind(data,cbind(replicate=i,iris))}

Using broom+dplyr, I could, for example, make anova on each subset of this data block (group by replication) and save the p values ​​for the term “Views”, using

library(devtools)
install_github("dgrtwo/broom")
library(broom)
library(dplyr)

group_by(data, replicate) %>% do(tidy(Anova(aov(Sepal.Length ~ Species, data = .),type="III"))) %>% filter(term=="Species")

Source: local data frame [10 x 6]
Groups: replicate [10]

   replicate    term    sumsq    df statistic      p.value
       (int)   (chr)    (dbl) (dbl)     (dbl)        (dbl)
1          1 Species 189.6364     2  362.6614 2.580311e-94
2          2 Species 189.6364     2  362.6614 2.580311e-94
3          3 Species 189.6364     2  362.6614 2.580311e-94
4          4 Species 189.6364     2  362.6614 2.580311e-94
5          5 Species 189.6364     2  362.6614 2.580311e-94
6          6 Species 189.6364     2  362.6614 2.580311e-94
7          7 Species 189.6364     2  362.6614 2.580311e-94
8          8 Species 189.6364     2  362.6614 2.580311e-94
9          9 Species 189.6364     2  362.6614 2.580311e-94
10        10 Species 189.6364     2  362.6614 2.580311e-94

(I used 10 identical subsets of data as an example)

I look at least for the more general function " Anovabygroup", which will take a file frame, a grouping variable (here replicate, but it can also be a combination of several grouping variables), the type of model to run (for example, in this case 'aov(Sepal.Length ~ Species, data = .)', but it can also be a model lm, glm, lme, lmer or glmer or any other model processedAnova()), , p (, "", ) ( Anova). - , , , , ? , , , - (, "aov (Sepal.Length ~ Species, data =.)" ) . , , - ? , , , ...

PS github broom, CRAN, , Anova

+4
1

:

, -, . parse eval.

(glm, lm ..), , .

Anova :

anova_wrapper <- function(data, model_expression_as_string, grouping_variable,...) {
  f_wrap <- paste0('function(.) {',model_expression_as_string,'}') %>%    parse(text=.) %>% eval
  data %>% group_by_(grouping_variable) %>% 
     do(f_wrap(.) %>% Anova(...=...) %>% tidy) %>% return
}

:

( !):

data=data.frame()
for (i in 1:10) {data=rbind(data,cbind(replicate=i,iris))}

aov_model_expression_as_string = 'aov(Sepal.Length ~ Species, data = .)'
lm_model_expression_as_string = 'lm(Sepal.Length ~ Sepal.Width + Petal.Length , data = .)'
grouping_variable = 'replicate'


data %>% 
    anova_wrapper(model_expression_as_string = aov_model_expression_as_string,
                  grouping_variable = grouping_variable,type="III")



    Source: local data frame [30 x 6]
    Groups: replicate [10]

    replicate        term      sumsq    df statistic       p.value
    (int)       (chr)      (dbl) (dbl)     (dbl)         (dbl)
    1          1 (Intercept) 1253.00180     1 4728.1630 1.134286e-113
    2          1     Species   63.21213     2  119.2645  1.669669e-31
    3          1   Residuals   38.95620   147        NA            NA
    4          2 (Intercept) 1253.00180     1 4728.1630 1.134286e-113
    5          2     Species   63.21213     2  119.2645  1.669669e-31
    6          2   Residuals   38.95620   147        NA            NA
    7          3 (Intercept) 1253.00180     1 4728.1630 1.134286e-113
    8          3     Species   63.21213     2  119.2645  1.669669e-31
    9          3   Residuals   38.95620   147        NA            NA
    10         4 (Intercept) 1253.00180     1 4728.1630 1.134286e-113
    ..       ...         ...        ...   ...       ...           ...

lm aov Anova:

data %>% 
    anova_wrapper(model_expression_as_string = lm_model_expression_as_string,
                  grouping_variable = grouping_variable,type="III")



    replicate         term    sumsq    df statistic      p.value
    (int)        (chr)    (dbl) (dbl)     (dbl)        (dbl)
    1          1  Sepal.Width  8.19627     1  73.78707 1.163254e-14
    2          1 Petal.Length 84.42733     1 760.05861 5.847914e-60
    3          1    Residuals 16.32876   147        NA           NA
    4          2  Sepal.Width  8.19627     1  73.78707 1.163254e-14
    5          2 Petal.Length 84.42733     1 760.05861 5.847914e-60
    6          2    Residuals 16.32876   147        NA           NA
    7          3  Sepal.Width  8.19627     1  73.78707 1.163254e-14
    8          3 Petal.Length 84.42733     1 760.05861 5.847914e-60
    9          3    Residuals 16.32876   147        NA           NA
    10         4  Sepal.Width  8.19627     1  73.78707 1.163254e-14
    ..       ...          ...      ...   ...       ...          ...

/anovas monte carlo, . R , !

+3

Source: https://habr.com/ru/post/1606285/


All Articles