Substitution of the data frame by the number of repetitions

If I have a dataframe like this:

neu <- data.frame(test1 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), test2 = c("a","b","a","b","c","c","a","c","c","d","d","f","f","f")) neu test1 test2 1 1 a 2 2 b 3 3 a 4 4 b 5 5 c 6 6 c 7 7 a 8 8 c 9 9 c 10 10 d 11 11 d 12 12 f 13 13 f 14 14 f 

and I would like to choose only those values ​​where the level of factor test2 appears more than three times can be said, which would be the fastest way?

Thank you very much, I really did not find the correct answer in the previous questions.

+6
source share
4 answers

Find the lines using:

 z <- table(neu$test2)[table(neu$test2) >= 3] # repeats greater than or equal to 3 times 

Or:

 z <- names(which(table(neu$test2)>=3)) 

Then the subset with:

 subset(neu, test2 %in% names(z)) 

Or:

 neu[neu$test2 %in% names(z),] 
+7
source

Here's another way:

  with(neu, neu[ave(seq(test2), test2, FUN=length) > 3, ]) # test1 test2 # 5 5 c # 6 6 c # 8 8 c # 9 9 c 
+5
source

I would use count from the plyr package to do the counting:

 library(plyr) count_result = count(neu, "test2") matching = with(count_result, test2[freq > 3]) with(neu, test1[test2 %in% matching]) [1] 5 6 8 9 
+3
source

Method (best scaling) data.table :

 library(data.table) dt = data.table(neu) dt[dt[, .I[.N >= 3], by = test2]$V1] 

Note: I hope in the future the following simple syntax will be a quick way to do this:

 dt[, .SD[.N >= 3], by = test2] 

(cf Group subset with data.table )

+2
source

Source: https://habr.com/ru/post/945154/