Filter rows by last maximum ordinal value by time

I have a dataframe with id, order time value and value. And for each group of identifiers, I would like to delete rows with a smaller value than rows with a smaller time value.

data <- data.frame(id = c(rep(c("a", "b"), each = 3L), "b"), 
                   time = c(0, 1, 2, 0, 1, 2, 3),
                   value = c(1, 1, 2, 3, 1, 2, 4))

> data
  id time value
1  a    0     1
2  a    1     1
3  a    2     2
4  b    0     3
5  b    1     1
6  b    2     2
7  b    3     4

Thus, the result will be:

> data
  id time value
1  a    0     1
2  a    2     2
3  b    0     3
4  b    3     4

(For id == blines where they time %in% c(3, 4)are deleted, because the value is valueless than with timebelow)

I'm thinking of lag

data %>%
   group_by(id) %>%
   filter(time == 0 | lag(value, order_by = time) < value)

Source: local data frame [5 x 3]
Groups: id [2]

      id  time value
  <fctr> <dbl> <dbl>
1      a     0     1
2      a     2     2
3      b     0     3
4      b     2     2
5      b     3     4

But this does not work as expected, since it is a vectorized function, so the idea should be to use the "recursive lag function" or to check the last maximum value. I can do this recursively with a loop, but I'm sure there is a simpler and higher level way to do this.

Any help would be appreciated, thanks!

+4
3

data.table:

library(data.table)
setDT(data)
data[, myVal := cummax(c(0, shift(value)[-1])), by=id][value > myVal][, myVal := NULL][]
   id time value
1:  a    0     1
2:  a    2     2
3:  b    0     3
4:  b    3     4

shift cummax, . c(0, shift(value)[-1]) 0, , . min(value)-1 [-1] shift, NA. , . .

+3

- /-equi-, data.table

library(data.table) # v1.10.0
setDT(data)[!data, on = .(id, time > time, value <= value)]
#    id time value
# 1:  a    0     1
# 2:  a    2     2
# 3:  b    0     3
# 4:  b    3     4

: " time , value , , (! sign)"

+3

Here is an option dplyr. After grouping by "id" we have the filterrow where the "value" is greater than the cumulative maximum of the "lag" of the column "value"

library(dplyr)
data %>% 
  group_by(id) %>%
  filter(value > cummax(lag(value, default = 0)) ) 
#      id  time value
#   <fctr> <dbl> <dbl>
#1      a     0     1
#2      a     2     2
#3      b     0     3
#4      b     3     4

Or another option sliceafter arrangeing by 'id' and 'time' (like the OP mentioned aboutorder

data %>%
     group_by(id) %>%
     arrange(id, time) %>%
     slice(which(value > cummax(lag(value, default = 0)))) 
0
source

Source: https://habr.com/ru/post/1667259/


All Articles