I need to remove rows from a data frame based on the repetition of values in a given column, but only consecutive. For example, for the following data frame:
df = data.frame(x=c(1,1,1,2,2,4,2,2,1))
df$y <- c(10,11,30,12,49,13,12,49,30)
df$z <- c(1,2,3,4,5,6,7,8,9)
x y z
1 10 1
1 11 2
1 30 3
2 12 4
2 49 5
4 13 6
2 12 7
2 49 8
1 30 9
I would need to delete rows with consecutive duplicate values in column x, save the last duplicate row, and save the data frame structure:
x y z
1 30 3
2 49 5
4 13 6
2 49 8
1 30 9
Following the directions helpand some other posts, I tried to use the function duplicated:
df[ !duplicated(x,fromLast=TRUE), ] # which gives me this:
x y z
1 1 10 1
6 4 13 6
7 2 12 7
9 1 30 9
NA NA NA NA
NA.1 NA NA NA
NA.2 NA NA NA
NA.3 NA NA NA
NA.4 NA NA NA
NA.5 NA NA NA
NA.6 NA NA NA
NA.7 NA NA NA
NA.8 NA NA NA
I don’t know why I get the NA rows at the end (which was not the case with the same table I tested), but it only works partially on the values.
I also tried using the package data.tableas follows:
library(data.table)
dt <- as.data.table(df)
setkey(dt, x)
dt[J(unique(x)), mult ='last']
, , , - :
x y z
1 30 9
2 49 8
4 13 6
, , . , , .
.