Dplyr and tidyr - calculation of several linear models with factors at once

Question

Dplyr and tidyr - calculation of several linear models with factors at once

After I read more on tidyverse, I immediately began to customize many linear models, as described in this . Namely, I would do something in this direction:

library(dplyr)
library(tidyr)
library(purrr)
df <- data.frame(y = rnorm(10), 
                 x1 = runif(10),
                 x2 = runif(10))

df %>%
  gather(covariate, value, x1:x2) %>% 
  group_by(covariate) %>% 
  nest() %>% 
  mutate(model = map(.x = data , .f = ~lm(y ~ value, data = .))) %>% 
  mutate(rsquared = map_dbl(.x = model, .f = ~summary(.)$r.squared))

The problem is that this approach fails when the variables are not of the same type, for example, when they are numeric, but one is a factor, since the function gather()will force the entire vector valueto a coefficient. For instance,

df <- data.frame(y = rnorm(10), 
                 x1 = runif(10),
                 x3 = sample(c("a", "b", "c"), 10, replace = TRUE))

df %>%
  gather(covariate, value, x1:x3) %>% 
  sapply(class)

warning follows

Warning message:
attributes are not identical across measure variables; they will be dropped 

          y   covariate       value 
  "numeric" "character" "character"

and the column valueis a symbol, so trick c nest()will no longer work, since all covariates will be placed as factors.

I am wondering if there is a way to do this.

+4

r dplyr tidyr

Theodor 10 . '16 17:23

1

aosmith · Accepted Answer · 2016-08-10T17:52:20+0000

, , , .

, type_convert readr type.convert "".

type_convert:

mutate(model = map(.x = data , .f = ~lm(y ~ value, data = readr::type_convert(.))))

type.convert:

mutate(model = map(.x = data , .f = ~lm(y ~ type.convert(value), data = .)))

:

df %>%
    gather(covariate, value, x1:x3) %>% 
    group_by(covariate) %>% 
    nest() %>% 
    mutate(model = map(.x = data , .f = ~lm(y ~ type.convert(value), data = .))) %>% 
    mutate(rsquared = map_dbl(.x = model, .f = ~summary(.)$r.squared))

# A tibble: 2 x 4
  covariate              data    model   rsquared
      <chr>            <list>   <list>      <dbl>
1        x1 <tibble [10 x 2]> <S3: lm> 0.33176960
2        x3 <tibble [10 x 2]> <S3: lm> 0.06150498

Dplyr and tidyr - calculation of several linear models with factors at once

More articles: