Automatically calculate summary statistics for a data frame and create a new table

Question

Automatically calculate summary statistics for a data frame and create a new table

I have the following framework:

col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
          "chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
          "low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)

test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)

And I want to get something like this table (the values are random):

Species  Pop.density  %Resistance  CI_low  CI_high   Total samples
avi      low          2.0          1.2     2.2       30
avi      med          0            0       0.5       20
avi      high         3.5          2.9     4.2       10
chi      low          0.5          0.3     0.7       20
chi      med          2.0          1.9     2.1       150
chi      high         6.5          6.2     6.6       175

Collision with resistance is based on the col3 value above, where 1 = stable and 0 = unstable. I tried the following:

library(dplyr)
test_data<-test_data %>%
  count(col1,col2,col3) %>%
  group_by(col1, col2) %>%
  mutate(perc_res = prop.table(n)*100)

I tried this and it seems to be almost a trick since I get a percentage of the total 1s and 0s in col3 for each value in col1 and 2, however the resulting samples are wrong, as I count all three columns when the correct count will be only for col1 and 2.

For the confidence interval, I would use the following:

binom.test(resistant samples,total samples)$conf.int*100

However, I'm not sure how to implement it along with the rest. Is there an easy and quick way to do this?

+4

r dplyr

Haakonkas 15 sept. '17 at 14:37

2

...

library(data.table)
setDT(DT)

DT[, { 
  bt <- binom.test(sum(resists), .N)$conf.int*100
  .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)
}, keyby=.(species, popdens)]

    species popdens  res_rate    res_lo    res_hi n
 1:     avi     low   0.00000  0.000000  70.75982 3
 2:     avi     med   0.00000  0.000000  97.50000 1
 3:     bov     low 100.00000 15.811388 100.00000 2
 4:     bov     med  50.00000  1.257912  98.74209 2
 5:     bov    high 100.00000 15.811388 100.00000 2
 6:     chi     low   0.00000  0.000000  97.50000 1
 7:     chi     med  50.00000  1.257912  98.74209 2
 8:     chi    high  66.66667  9.429932  99.15962 3
 9:     fox     low   0.00000  0.000000  97.50000 1
10:     fox     med  50.00000  1.257912  98.74209 2

( )...

DT[CJ(species = species, popdens = popdens, unique = TRUE), on=.(species, popdens), {
  bt <- 
    if (.N > 0L) binom.test(sum(resists), .N)$conf.int*100 
    else NA_real_
  .(res_rate = mean(resists)*100, res_lo = bt[1], res_hi = bt[2], n = .N)    
}, by=.EACHI]

    species popdens  res_rate    res_lo    res_hi n
 1:     avi     low   0.00000  0.000000  70.75982 3
 2:     avi     med   0.00000  0.000000  97.50000 1
 3:     avi    high        NA        NA        NA 0
 4:     bov     low 100.00000 15.811388 100.00000 2
 5:     bov     med  50.00000  1.257912  98.74209 2
 6:     bov    high 100.00000 15.811388 100.00000 2
 7:     chi     low   0.00000  0.000000  97.50000 1
 8:     chi     med  50.00000  1.257912  98.74209 2
 9:     chi    high  66.66667  9.429932  99.15962 3
10:     fox     low   0.00000  0.000000  97.50000 1
11:     fox     med  50.00000  1.257912  98.74209 2
12:     fox    high        NA        NA        NA 0

DT[i, j, by=] ...

i , , on= roll=.
by= , keyby=.
j - , .

j , .() list(). . ?data.table.

( , 0/1 false/true, ):

DT = data.frame(
  species = col1, 
  popdens = factor(col2, levels=c("low", "med", "high")), 
  resists = col3
)

+4

Frank 15 . '17 15:33

Jason Punyon · Accepted Answer · 2017-09-15T15:02:55+0000

.

library(tidyverse)
library(broom)

test_data %>%
  mutate(col3 = ifelse(col3 == 0, "NonResistant", "Resistant")) %>%
  count(col1, col2, col3) %>%
  spread(col3, n, fill = 0) %>%
  mutate(PercentResistant = Resistant / (NonResistant + Resistant)) %>%
  mutate(test = map2(Resistant, NonResistant, ~ binom.test(.x, .x + .y) %>% tidy())) %>%
  unnest() %>%
  transmute(Species = col1, Pop.density = col2, PercentResistant, CI_low = conf.low * 100, CI_high = conf.high * 100, TotalSamples = Resistant + NonResistant)

0/1, .
.
col3/n , Resistant/NonResistant counts (n) . .
test.
, .
, .

Automatically calculate summary statistics for a data frame and create a new table

More articles: