Purrr displays t.test on split df

I'm new to purrr, Hadley is a promising functional programming R library . I am trying to take a grouped and split data frame and run a t-test for a variable. An example using a sample dataset might look like this.

mtcars %>% dplyr::select(cyl, mpg) %>% group_by(as.character(cyl)) %>% split(.$cyl) %>% map(~ t.test(.$`4`$mpg, .$`6`$mpg)) 

This results in the following error:

 Error in var(x) : 'x' is NULL In addition: Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In mean.default(x) : argument is not numeric or logical: returning NA 

I just don’t understand how map works? Or is there a better way to think about this?

+5
source share
3 answers

Especially when it comes to pipes that require multiple inputs (we don’t have Haskell Arrows here), it’s easier for me to first talk about types / signatures and then encapsulate the logic in the function (which you can unit test), then write a short chain.

In this case, you want to compare all possible pairs of vectors, so I would set a goal to write a function that takes a pair (i.e. a list of 2) vectors and returns two-way t.test from them.

Once you have done this, you just need glue. So the plan is this:

  • Write a function that takes a list of vectors and performs a two-way t-test.
  • Write a / pipe function that extracts vectors from mtcars (easily).
  • Match the above list of pairs.

It is important that this plan be written before the code is written. Things get confused somehow because R is not strongly typed, but in this way you talk about β€œtypes” first, second implementation.

Step 1

t.test accepts points, so we use purrr:lift to get a list. Since we do not want to match the names of the list items, we use .unnamed = TRUE . In addition, we are making it increasingly clear that we are using the t.test function with arity 2 (although this extra step is not necessary for the code to work).

 t.test2 <- function(x, y) t.test(x, y) liftedTT <- lift(t.test2, .unnamed = TRUE) 

Step 2

Wrap the function we got in step 1 in a functional chain that takes a simple pair (here I use indexes, it should be easy to use the factor levels of the cylinder, but I don't have time to figure it out).

 doTT <- function(pair) { mtcars %>% split(as.character(.$cyl)) %>% map(~ select(., mpg)) %>% extract(pair) %>% liftedTT %>% broom::tidy } 

Step 3

Now that we have all the finished lego parts, the composition is trivial.

 1:length(unique(mtcars$cyl)) %>% combn(2) %>% as.data.frame %>% as.list %>% map(~ doTT(.)) $V1 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high 1 6.920779 26.66364 19.74286 4.719059 0.0004048495 12.95598 3.751376 10.09018 $V2 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high 1 11.56364 26.66364 15.1 7.596664 1.641348e-06 14.96675 8.318518 14.80876 $V3 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high 1 4.642857 19.74286 15.1 5.291135 4.540355e-05 18.50248 2.802925 6.482789 

There is little room for cleaning, mainly using factor levels and storing them in the output (and not using global variables in the second function), but I think that the core of what you wanted is here. The trick to not getting lost, in my experience, is to work from the inside out.

+6
source

I do not quite understand the expected result, but this may be the starting point for an answer. map() from purrr uses the .x argument in the formula argument.

Here is one way to achieve what I think you are trying to do with purrr .

 mtcars %>% split(as.character(.$cyl)) %>% map(~t.test(.x$mpg)) 

But, purrr::by_slice() goes well with dplyr::group_by() .

 library(purrr) library(dplyr) mtcars %>% dplyr::select(cyl, mpg) %>% group_by(as.character(cyl)) %>% by_slice(~ t.test(.x$mpg)) 

Or you can completely skip purrr using dplyr:::summarise() .

 library(purrr) library(dplyr) mtcars %>% dplyr::select(cyl, mpg) %>% group_by(as.character(cyl)) %>% summarise(t_test = data_frame(t.test(.$mpg))) 

If the nested data.frame confused, broom can help us get a brief summary of the results of data.frame .

purrr + broom + tidyr

 library(broom) library(tidyr) mtcars %>% group_by(as.character(cyl)) %>% by_slice(~tidy(t.test(.x$mpg))) %>% unnest() 

dplyr + broom

 library(broom) mtcars %>% dplyr::select(cyl, mpg) %>% group_by(as.character(cyl)) %>% do(tidy(t.test(.$mpg))) 

Edited to include response to comment

With pipes, we can get carried away pretty quickly. I think Walt handled the answer well, but I wanted to make sure I gave purrr answer. I hope using pipeR not too confusing.

 library(purrr) library(dplyr) library(broom) library(tidyr) library(pipeR) mtcars %>>% (split(.,.$cyl)) %>>% (split_cyl~ names(split_cyl) %>>% ( cross_d( list(against=.,tested=.), .filter = `==` ) ) %>>% by_row( ~tidy(t.test(split_cyl[[.x$tested]]$mpg,split_cyl[[.x$against]]$mpg)) ) ) %>>% unnest() 
+9
source

To perform two test t-tests, you need to create combinations of cylinder numbers. I do not see that you can create combinations using the purrr functions. However, a method that uses only purrr and the basic functions of R is

 library(purrr) t_test2 <- mtcars %>% split(.$cyl) %>% transpose() %>% .[["mpg"]] %>% (function(x) combn(names(x), m=2, function(y) t.test(flatten_dbl(x[y[1]]), flatten_dbl(x[y[2]])) , simplify=FALSE)) 

although this seems a little far-fetched.

A similar approach that uses only basic R functions with a chain is

 t_test <- mtcars %>% split(.$cyl) %>% (function(x) combn(names(x), m=2, function(y) x[y], simplify=FALSE)) %>% lapply( function(x) t.test(x[[1]]$mpg, x[[2]]$mpg)) 
+2
source

Source: https://habr.com/ru/post/1243596/


All Articles