If you use Reduce with accumulate = TRUE (or purrr::accumulate if you want), you can reset the difference in the move if it is greater than or equal to 5. Calling cumsum on whether 0 returns the number of discards.
library(tidyverse) df <- data.frame(id = c(1,1,1,1,1,2,2,2,2,3,3,3,3), val = c(2,10,12,15,17,2,4,7,8,12,15,20,25)) df %>% group_by(id) %>% mutate(reset = cumsum( Reduce( function(x, y) if (x + y >= 5) 0 else x + y, diff(val), init = 0, accumulate = TRUE ) == 0 )) #> # A tibble: 13 x 3 #> # Groups: id [3] #> id val reset #> <dbl> <dbl> <int> #> 1 1 2 1 #> 2 1 10 2 #> 3 1 12 2 #> 4 1 15 3 #> 5 1 17 3 #> 6 2 2 1 #> 7 2 4 1 #> 8 2 7 2 #> 9 2 8 2 #> 10 3 12 1 #> 11 3 15 1 #> 12 3 20 2 #> 13 3 25 3
or using purrr::accumulate ,
df %>% group_by(id) %>% mutate(reset = cumsum( accumulate( diff(val), ~if (.x + .y >= 5) 0 else .x + .y, .init = 0 ) == 0 )) #> # A tibble: 13 x 3 #> # Groups: id [3] #> id val reset #> <dbl> <dbl> <int> #> 1 1 2 1 #> 2 1 10 2 #> 3 1 12 2 #> 4 1 15 3 #> 5 1 17 3 #> 6 2 2 1 #> 7 2 4 1 #> 8 2 7 2 #> 9 2 8 2 #> 10 3 12 1 #> 11 3 15 1 #> 12 3 20 2 #> 13 3 25 3
Regarding editing, the problem is that some of the differences are 0, which is the same as what it considers to be discarded. The simplest solution is to use NA instead of zero as the reset value:
library(tidyverse) sub.df <- structure(list(`ID` = c("1", "1", "1", "1", "1", "1", "1", "1", "1"), dateFormat = structure(c(1479955726, 1479955726, 1483703713, 1495190809, 1495190809, 1497265079, 1497265079, 1474023059, 1474023061), class = c("POSIXct", "POSIXt"), tzone = "America/Chicago")), .Names = c("ID", "dateFormat"), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame")) sub.df %>% group_by(ID) %>% arrange(ID, dateFormat) %>% mutate(reset = cumsum(is.na( accumulate(diff(dateFormat), ~{ s <- sum(.x, .y, na.rm = TRUE); if (s >= 5) NA else s }, .init = NA) ))) #> # A tibble: 9 x 3 #> # Groups: ID [1] #> ID dateFormat reset #> <chr> <dttm> <int> #> 1 1 2016-09-16 05:50:59 1 #> 2 1 2016-09-16 05:51:01 1 #> 3 1 2016-11-23 20:48:46 2 #> 4 1 2016-11-23 20:48:46 2 #> 5 1 2017-01-06 05:55:13 3 #> 6 1 2017-05-19 05:46:49 4 #> 7 1 2017-05-19 05:46:49 4 #> 8 1 2017-06-12 05:57:59 5 #> 9 1 2017-06-12 05:57:59 5
Ultimately, this approach also faces limitations, although, as if any values ββwere indeed NA , it would increase in a similar way. A more robust solution would be to return a list of two elements from each iteration, one for the total number with resets, and one for counting reset. However, this is a lot of work to implement:
sub.df %>% group_by(ID) %>% arrange(ID, dateFormat) %>% mutate(total_reset = accumulate( transpose(list(total = diff(dateFormat), reset = rep(0, n() - 1))), ~{ s <- .x$total + .y$total; if (s >= 5) { data_frame(total = 0, reset = .x$reset + 1) } else { data_frame(total = s, reset = .x$reset) } }, .init = data_frame(total = 0, reset = 1) )) %>% unnest() #> # A tibble: 9 x 4 #> # Groups: ID [1] #> ID dateFormat total reset #> <chr> <dttm> <dbl> <dbl> #> 1 1 2016-09-16 05:50:59 0 1 #> 2 1 2016-09-16 05:51:01 2 1 #> 3 1 2016-11-23 20:48:46 0 2 #> 4 1 2016-11-23 20:48:46 0 2 #> 5 1 2017-01-06 05:55:13 0 3 #> 6 1 2017-05-19 05:46:49 0 4 #> 7 1 2017-05-19 05:46:49 0 4 #> 8 1 2017-06-12 05:57:59 0 5 #> 9 1 2017-06-12 05:57:59 0 5
The general one looks a little silly, but if you look at diff, it is really correct.