Especially when it comes to pipes that require multiple inputs (we donβt have Haskell Arrows here), itβs easier for me to first talk about types / signatures and then encapsulate the logic in the function (which you can unit test), then write a short chain.
In this case, you want to compare all possible pairs of vectors, so I would set a goal to write a function that takes a pair (i.e. a list of 2) vectors and returns two-way t.test from them.
Once you have done this, you just need glue. So the plan is this:
- Write a function that takes a list of vectors and performs a two-way t-test.
- Write a / pipe function that extracts vectors from mtcars (easily).
- Match the above list of pairs.
It is important that this plan be written before the code is written. Things get confused somehow because R is not strongly typed, but in this way you talk about βtypesβ first, second implementation.
Step 1
t.test accepts points, so we use purrr:lift to get a list. Since we do not want to match the names of the list items, we use .unnamed = TRUE . In addition, we are making it increasingly clear that we are using the t.test function with arity 2 (although this extra step is not necessary for the code to work).
t.test2 <- function(x, y) t.test(x, y) liftedTT <- lift(t.test2, .unnamed = TRUE)
Step 2
Wrap the function we got in step 1 in a functional chain that takes a simple pair (here I use indexes, it should be easy to use the factor levels of the cylinder, but I don't have time to figure it out).
doTT <- function(pair) { mtcars %>% split(as.character(.$cyl)) %>% map(~ select(., mpg)) %>% extract(pair) %>% liftedTT %>% broom::tidy }
Step 3
Now that we have all the finished lego parts, the composition is trivial.
1:length(unique(mtcars$cyl)) %>% combn(2) %>% as.data.frame %>% as.list %>% map(~ doTT(.)) $V1 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high 1 6.920779 26.66364 19.74286 4.719059 0.0004048495 12.95598 3.751376 10.09018 $V2 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high 1 11.56364 26.66364 15.1 7.596664 1.641348e-06 14.96675 8.318518 14.80876 $V3 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high 1 4.642857 19.74286 15.1 5.291135 4.540355e-05 18.50248 2.802925 6.482789
There is little room for cleaning, mainly using factor levels and storing them in the output (and not using global variables in the second function), but I think that the core of what you wanted is here. The trick to not getting lost, in my experience, is to work from the inside out.