Substitution of the data frame by the number of repetitions

Question

Substitution of the data frame by the number of repetitions

If I have a dataframe like this:

neu <- data.frame(test1 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14), test2 = c("a","b","a","b","c","c","a","c","c","d","d","f","f","f")) neu test1 test2 1 1 a 2 2 b 3 3 a 4 4 b 5 5 c 6 6 c 7 7 a 8 8 c 9 9 c 10 10 d 11 11 d 12 12 f 13 13 f 14 14 f

and I would like to choose only those values where the level of factor test2 appears more than three times can be said, which would be the fastest way?

Thank you very much, I really did not find the correct answer in the previous questions.

+6

r

Miri putzig May 16 '13 at 11:38

source share

4 answers

Here's another way:

  with(neu, neu[ave(seq(test2), test2, FUN=length) > 3, ]) # test1 test2 # 5 5 c # 6 6 c # 8 8 c # 9 9 c

+5

Matthew plourde May 16 '13 at 11:52

source share

I would use count from the plyr package to do the counting:

 library(plyr) count_result = count(neu, "test2") matching = with(count_result, test2[freq > 3]) with(neu, test1[test2 %in% matching]) [1] 5 6 8 9

+3

Paul hiemstra May 16 '13 at 11:50

source share

Method (best scaling) data.table :

 library(data.table) dt = data.table(neu) dt[dt[, .I[.N >= 3], by = test2]$V1]

Note: I hope in the future the following simple syntax will be a quick way to do this:

 dt[, .SD[.N >= 3], by = test2]

(cf Group subset with data.table )

+2

eddi May 16 '13 at 14:47

source share

Thomas · Accepted Answer · 2013-05-16T11:49:14+0000

Find the lines using:

 z <- table(neu$test2)[table(neu$test2) >= 3] # repeats greater than or equal to 3 times

Or:

 z <- names(which(table(neu$test2)>=3))

Then the subset with:

 subset(neu, test2 %in% names(z))

Or:

 neu[neu$test2 %in% names(z),]

Substitution of the data frame by the number of repetitions

More articles: