Filter rows by last maximum ordinal value by time

Question

Filter rows by last maximum ordinal value by time

I have a dataframe with id, order time value and value. And for each group of identifiers, I would like to delete rows with a smaller value than rows with a smaller time value.

data <- data.frame(id = c(rep(c("a", "b"), each = 3L), "b"), 
                   time = c(0, 1, 2, 0, 1, 2, 3),
                   value = c(1, 1, 2, 3, 1, 2, 4))

> data
  id time value
1  a    0     1
2  a    1     1
3  a    2     2
4  b    0     3
5  b    1     1
6  b    2     2
7  b    3     4

Thus, the result will be:

> data
  id time value
1  a    0     1
2  a    2     2
3  b    0     3
4  b    3     4

(For id == blines where they time %in% c(3, 4)are deleted, because the value is valueless than with timebelow)

I'm thinking of lag

data %>%
   group_by(id) %>%
   filter(time == 0 | lag(value, order_by = time) < value)

Source: local data frame [5 x 3]
Groups: id [2]

      id  time value
  <fctr> <dbl> <dbl>
1      a     0     1
2      a     2     2
3      b     0     3
4      b     2     2
5      b     3     4

But this does not work as expected, since it is a vectorized function, so the idea should be to use the "recursive lag function" or to check the last maximum value. I can do this recursively with a loop, but I'm sure there is a simpler and higher level way to do this.

Any help would be appreciated, thanks!

+4

r dplyr

Julien Navarre 19 . '17 13:44

3

lmo · Answer 1 · 2017-01-19T14:05:20+0000

data.table:

library(data.table)
setDT(data)
data[, myVal := cummax(c(0, shift(value)[-1])), by=id][value > myVal][, myVal := NULL][]
   id time value
1:  a    0     1
2:  a    2     2
3:  b    0     3
4:  b    3     4

shift cummax, . c(0, shift(value)[-1]) 0, , . min(value)-1 [-1] shift, NA. , . .

David Arenburg · Answer 2 · 2017-01-19T14:39:11+0000

- /-equi-, data.table

library(data.table) # v1.10.0
setDT(data)[!data, on = .(id, time > time, value <= value)]
#    id time value
# 1:  a    0     1
# 2:  a    2     2
# 3:  b    0     3
# 4:  b    3     4

: " time , value , , (! sign)"

akrun · Answer 3 · 2017-01-19T14:09:03+0000

Here is an option dplyr. After grouping by "id" we have the filterrow where the "value" is greater than the cumulative maximum of the "lag" of the column "value"

library(dplyr)
data %>% 
  group_by(id) %>%
  filter(value > cummax(lag(value, default = 0)) ) 
#      id  time value
#   <fctr> <dbl> <dbl>
#1      a     0     1
#2      a     2     2
#3      b     0     3
#4      b     3     4

Or another option sliceafter arrangeing by 'id' and 'time' (like the OP mentioned aboutorder

data %>%
     group_by(id) %>%
     arrange(id, time) %>%
     slice(which(value > cummax(lag(value, default = 0))))

Filter rows by last maximum ordinal value by time

More articles: