A subset of the data in R using two criteria, one of which is a regular expression

Question

A subset of the data in R using two criteria, one of which is a regular expression

I have a dataset something like this:

col_a col_b col_c 1 abc_boy 1 2 abc_boy 2 1 abc_girl 1 2 abc_girl 2

I need to pick up the first line only based on col_b and col_c , and then change the valye to col_c , something like this:

df[grep("_boy$",df[,"col_b"]) & df[,"col_c"]=="1","col_c"] <- "yes"

But the above code is not in order, since the first criteria and the second criterion do not come from the same set.

I can do it stupidly using an explicit loop or make a “two-level” subset, something like this:

 df.a <- df[grep("_boy$",df[,"col_b"]),] #1 df.b <- df[grep("_boy$",df[,"col_b"],invert=TRUE),] #2 df.a <- df.a[df.a[,"col_c"]=="1","col_c"] <- "yes" #3 df.a <- df.a[df.a[,"col_c"]=="2","col_c"] <- "no" #4 df <- rbind(df.a,df.b) #5

But I prefer not to do this, can someone enlighten me how to "merge" #1 and #3 ? Thanks.

+4

regex r subset

lokheart Dec 27 '11 at 13:21

source share

2 answers

The reason it doesn't work, as you expected, despite the correct logic, is because you use grep , where you should use grepl . Try instead:

 df[ grepl("_boy$", df[,"col_b"]) & df[,"col_c"]=="1", "col_c"] <- "yes" > df col_a col_b col_c 1 1 abc_boy yes 2 2 abc_boy 2 3 1 abc_girl 1 4 2 abc_girl 2

grepl retrieves the logical vector of the length of its arguments, while grep returns a shorter number vector, so it returns in this case.

+6

42- Dec 27 '11 at 13:56

source share

rcs · Accepted Answer · 2011-12-27T13:55:49+0000

Try grepl instead of grep . grepl returns a logical vector (corresponding or not for each x element), which can be combined with logical operators.

A subset of the data in R using two criteria, one of which is a regular expression

More articles: