Is a subset of multiple columns in R more elegant?

Question

Is a subset of multiple columns in R more elegant?

I am a subset of data according to several criteria in several columns. I select rows in a data frame that contain any of several values defined in vector “criteria” in any of three different columns.

I have some code that works, but wonder what other (more elegant?) Ways to do this. Here is what I did:

criteria <-c(1:10) subset1 <-subset(data, data[, "Col1"] %in% criteria | data[, "Col2"] %in% criteria | data[, "Col3"] %in% criteria)

Suggestions are welcome. (I'm new to R, so very simple explanations about what you offer are also warmly welcome.)

+4

r subset

user1257313 Mar 09 '12 at 21:49

source share

2 answers

As an example, I use DF , not data .

 DF[apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria), 1, any),]

To break down what this does:

Make a matrix of the indicated columns for each element in this matrix test, if it contains one of the criteria. Then, for each row of this matrix, see if there is any of the elements in the row TRUE . If so, save the appropriate row in the source dataset.

Work with an example:

Start with dummy data:

 DF <- data.frame(Col1=seq(1, by=2, length=10), Col2=seq(3, by=3, length=10), Col3=seq(7, by=1, length=10), other=LETTERS[1:10])

which looks like

 > DF Col1 Col2 Col3 other 1 1 3 7 A 2 3 6 8 B 3 5 9 9 C 4 7 12 10 D 5 9 15 11 E 6 11 18 12 F 7 13 21 13 G 8 15 24 14 H 9 17 27 15 I 10 19 30 16 J

Pull out only the columns of interest.

 > as.matrix(DF[c("Col1","Col2","Col3")]) Col1 Col2 Col3 [1,] 1 3 7 [2,] 3 6 8 [3,] 5 9 9 [4,] 7 12 10 [5,] 9 15 11 [6,] 11 18 12 [7,] 13 21 13 [8,] 15 24 14 [9,] 17 27 15 [10,] 19 30 16

Check each entry against criteria

 > apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria) Col1 Col2 Col3 [1,] TRUE TRUE TRUE [2,] TRUE TRUE TRUE [3,] TRUE TRUE TRUE [4,] TRUE FALSE TRUE [5,] TRUE FALSE FALSE [6,] FALSE FALSE FALSE [7,] FALSE FALSE FALSE [8,] FALSE FALSE FALSE [9,] FALSE FALSE FALSE [10,] FALSE FALSE FALSE

Check if any of the values in the line is TRUE

 > apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria), 1, any) [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE

Use this to index the original data frame.

 > DF[apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria), 1, any),] Col1 Col2 Col3 other 1 1 3 7 A 2 3 6 8 B 3 5 9 9 C 4 7 12 10 D 5 9 15 11 E

+6

Brian diggs Mar 09 '12 at 22:01

source share

nograpes · Accepted Answer · 2012-03-09T22:16:24+0000

I'm not sure if you need two calls to apply :

 # Data df=data.frame(x=1:4,Col1=c(11,12,3,13),Col2=c(9,12,10,13),Col3=c(9,13,42,23)) criteria=1:10 # Solution df[apply(df [c('Col1','Col2','Col3')],1,function(x) any(x %in% criteria)),]

If you do not want to make many columns, perhaps more readably say:

 subset(df, Col1 %in% criteria | Col2 %in% criteria | Col3 %in% criteria)

Is a subset of multiple columns in R more elegant?

More articles: