The conditional cumulative average for each group in R

I have a dataset that looks like this:

id   a   b
1    AA  2
1    AB  5
1    AA  1
2    AB  2
2    AB  4
3    AB  4
3    AB  3
3    AA  1

I need to calculate the cumulative value for each record in each group and exclude the case when a == 'AA'. Thus, the sample yield should be:

id   a   b  mean
1    AA  2   -
1    AB  5   5
1    AA  1   5
2    AB  2   2
2    AB  4   (4+2)/2
3    AB  4   4
3    AB  3   (4+3)/2
3    AA  1   (4+3)/2
3    AA  4   (4+3)/2

I tried to achieve this using dplyr and cummean, getting an error.

df <- df %>%
       group_by(id) %>%
       mutate(mean = cummean(b[a != 'AA']))

Error: incompatible size (123) expecting 147 (group size) or 1

Can you suggest a better way to achieve the same result in R?

+1
source share
2 answers

Here you need to restore cummeanby dividing cumsumthe adjusted counter to the adjusted counter. As single line:

df %>% group_by(id) %>% mutate(cumsum(b * (a != 'AA')) / cumsum(a != 'AA'))

( " a!='AA' - !" - ), a != 'AA'

df %>%
    group_by(id) %>%
    mutate(relevance = 0+(a!='AA'), 
           mean = cumsum(relevance * b) / cumsum(relevance))
+3

. "id". "", "b", "AA" "a" NA (b*NA^(a=='AA')). NA^(a=='AA') NA 'AA' 'a' 1 . , 'b', 1 'b', NA . na.aggregate, "NA" mean -NA , cummean, . "a" "AA", NA NA^(row_number()==1 & a=='AA').

library(zoo)
library(dplyr)
df %>% 
   group_by(id) %>% 
   mutate(Mean= cummean(na.aggregate(b*NA^(a=='AA')))*
                 NA^(row_number()==1 & a=='AA'))
# Source: local data frame [9 x 4]
#Groups: id [3]

#      id     a     b  Mean
#   (int) (chr) (int) (dbl)
#1     1    AA     2    NA
#2     1    AB     5   5.0
#3     1    AA     1   5.0
#4     2    AB     2   2.0
#5     2    AB     4   3.0
#6     3    AB     4   4.0
#7     3    AB     3   3.5
#8     3    AA     1   3.5
#9     3    AA     4   3.5

df <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), 
a = c("AA", 
"AB", "AA", "AB", "AB", "AB", "AB", "AA", "AA"), b = c(2L, 5L, 
1L, 2L, 4L, 4L, 3L, 1L, 4L)), .Names = c("id", "a", "b"),
class = "data.frame", row.names = c(NA, -9L))
+2

Source: https://habr.com/ru/post/1607302/


All Articles