Define and replace duplicate blocks in a character vector

I have a row vector a.

a = c("Chemistry", "Chemistry", "Math","English","Math","Math","Physics","Physics","Chemistry")

Is there a quick and easy way to do this as follows?

c("Chemistry", NA, "Math","English","Math",NA,"Physics",NA,"Chemistry")

I tried diffand duplicateddid not get what I want.

+2
source share
2 answers

We can compare adjacent elements vectorto get a logical vector and assign those from TRUEto NA.

a[c(FALSE,a[-1]==a[-length(a)])] <- NA
a
#[1] "Chemistry" NA          "Math"      "English"   "Math"      NA          "Physics"   NA          "Chemistry"

Or how the OP mentioned about diffcan convert to factor, force it to numeric, use diffand then assign NA

a[c(FALSE,!diff(as.numeric(factor(a))))] <- NA

Or using duplicated

library(data.table)
a[duplicated(rleid(a))] <- NA
+3
source

Here's an approach that uses rleand replace:

replace(a, sequence(rle(a)$lengths) > 1, NA)
# [1] "Chemistry" NA          "Math"      "English"   "Math"      NA         
# [7] "Physics"   NA          "Chemistry"
+5

Source: https://habr.com/ru/post/1628347/


All Articles