Data cutoff string. after duplication

Suppose I have the following dataset:

library(data.table)
dt <- data.table(x = c(1, 2, 4, 5, 2, 3, 4))

> dt
   x
1: 1
2: 2
3: 4
4: 5
5: 2
6: 3
7: 4

I would like to disconnect after the 4th row since when the first duplicate will happen (number 2).

Expected Result:

   x
1: 1
2: 2
3: 4
4: 5

Needless to say, I'm not looking dt[1:4, ,][]because the real data set is more "complex."

I tried with shift(), .Ibut it did not work. One idea: dt[x %in% dt$x[1:(.I - 1)], .SD, ][].

+4
source share
2 answers

Maybe we can use duplicated

dt[seq_len(which(duplicated(x))[1]-1)]
#   x
#1: 1
#2: 2
#3: 4
#4: 5

Or as @lmo suggested

dt[seq_len(which.max(duplicated(dt))-1)]
+6
source

Here is another option:

dt[seq_len(anyDuplicated(x)-1L)]

From the help files:

anyDuplicated(): 1- , , 0.

, , ( , ).

, :

dt[if((ix <- anyDuplicated(x)-1L) > 0) seq_len(ix) else seq_len(.N)]

, , .

+5

Source: https://habr.com/ru/post/1684749/


All Articles