R: calculate aggregate amounts and bills from the last occurrence of a value

Question

R: calculate aggregate amounts and bills from the last occurrence of a value

with simplified data

set.seed(13) user_id = rep(1:2, each = 10) order_id = sample(1:20, replace = FALSE) cost = round(runif(20, 1.5, 75),1) category = sample( c("apples", "pears", "chicken"), 20, replace = TRUE) pit = rep(c(0,0,0,0,1), 4) df = data.frame(cbind(user_id, order_id, cost, category, pit)) user_id order_id cost category pit 1 15 11.6 pears 0 1 5 41.7 apples 0 1 8 51.3 chicken 0 1 2 40.3 pears 0 1 16 7.9 pears 1 1 1 47.1 chicken 0 1 9 3.8 apples 0 1 10 35.4 apples 0 1 11 25.8 chicken 0 1 20 48.1 chicken 1 2 7 32.6 pears 0 2 18 31.3 pears 0 2 14 69 apples 0 2 4 60.9 chicken 0 2 13 41.2 apples 1 2 17 9.4 pears 0 2 19 34.9 apples 0 2 6 5.3 pears 0 2 3 57.3 apples 0 2 12 7.7 apples 1

I would like to create columns with the total amount of costs and the number of individual categories since the last span == 1 . Thus, the result will look like this:

 user_id order_id cost category pit cum_cost distinct_categories 1 15 11.6 pears 0 11.6 1 1 5 41.7 apples 0 53.3 2 1 8 51.3 chicken 0 104.6 3 1 2 40.3 pears 0 144.9 3 1 16 7.9 pears 1 152.8 3 1 1 47.1 chicken 0 47.1 1 1 9 3.8 apples 0 50.9 2 1 10 35.4 apples 0 86.3 2 1 11 25.8 chicken 0 112.1 3 1 20 48.1 chicken 1 160.2 3 2 7 32.6 pears 0 32.6 1 2 18 31.3 pears 0 63.9 1 2 14 69 apples 0 132.9 2 2 4 60.9 chicken 0 193.8 3 2 13 41.2 apples 1 235.0 3 2 17 9.4 pears 0 9.4 1 2 19 34.9 apples 0 44.3 2 2 6 5.3 pears 0 49.6 2 2 3 57.3 apples 0 106.9 2 2 12 7.7 apples 1 114.6 2

Ideally, the solution would be in dplyr , but I am open to other packages / approaches. Many thanks for your help! Kasia

+3

r conditional dplyr cumsum

Kasia Kulma Dec 7 '16 at 14:57

source share

1 answer

akrun · Accepted Answer · 2016-12-07T15:16:37+0000

We can use dplyr . Grouping by "user_id" and the grouping variable created by calculating the total amount of the "hole" and getting its lag , we get cumsum "cost" as "cum_cost" and cummax the match index between the categories and unique 'category' as' distinct_categories.

 library(dplyr) df %>% group_by(user_id, ind= lag(cumsum(pit), default=0)) %>% mutate(cum_cost = cumsum(cost), distinct_categories = cummax(match(category, unique(category)))) # user_id order_id cost category pit ind cum_cost distinct_categories # <int> <int> <dbl> <chr> <int> <dbl> <dbl> <int> #1 1 3 49.8 apples 0 0 49.8 1 #2 1 13 14.8 chicken 0 0 64.6 2 #3 1 18 11.4 apples 0 0 76.0 2 #4 1 15 52.6 chicken 0 0 128.6 2 #5 1 11 13.6 chicken 1 0 142.2 2 #6 1 19 26.9 chicken 0 1 26.9 1 #7 1 2 54.9 chicken 0 1 81.8 1 #8 1 1 70.6 chicken 0 1 152.4 1 #9 1 10 55.0 chicken 0 1 207.4 1 #10 1 12 19.7 chicken 1 1 227.1 1 #11 2 8 40.0 pears 0 2 40.0 1 #12 2 16 37.4 pears 0 2 77.4 1 #13 2 20 70.5 pears 0 2 147.9 1 #14 2 5 63.8 apples 0 2 211.7 2 #15 2 14 31.9 apples 1 2 243.6 2 #16 2 17 9.1 chicken 0 3 9.1 1 #17 2 4 21.9 pears 0 3 31.0 2 #18 2 7 52.3 apples 0 3 83.3 3 #19 2 9 43.3 chicken 0 3 126.6 3 #20 2 6 9.9 pears 1 3 136.5 3

R: calculate aggregate amounts and bills from the last occurrence of a value

More articles: