A subset of the data in R using two criteria, one of which is a regular expression

I have a dataset something like this:

col_a col_b col_c 1 abc_boy 1 2 abc_boy 2 1 abc_girl 1 2 abc_girl 2 

I need to pick up the first line only based on col_b and col_c , and then change the valye to col_c , something like this:

df[grep("_boy$",df[,"col_b"]) & df[,"col_c"]=="1","col_c"] <- "yes"

But the above code is not in order, since the first criteria and the second criterion do not come from the same set.

I can do it stupidly using an explicit loop or make a “two-level” subset, something like this:

 df.a <- df[grep("_boy$",df[,"col_b"]),] #1 df.b <- df[grep("_boy$",df[,"col_b"],invert=TRUE),] #2 df.a <- df.a[df.a[,"col_c"]=="1","col_c"] <- "yes" #3 df.a <- df.a[df.a[,"col_c"]=="2","col_c"] <- "no" #4 df <- rbind(df.a,df.b) #5 

But I prefer not to do this, can someone enlighten me how to "merge" #1 and #3 ? Thanks.

+4
source share
2 answers

Try grepl instead of grep . grepl returns a logical vector (corresponding or not for each x element), which can be combined with logical operators.

+6
source

The reason it doesn't work, as you expected, despite the correct logic, is because you use grep , where you should use grepl . Try instead:

 df[ grepl("_boy$", df[,"col_b"]) & df[,"col_c"]=="1", "col_c"] <- "yes" > df col_a col_b col_c 1 1 abc_boy yes 2 2 abc_boy 2 3 1 abc_girl 1 4 2 abc_girl 2 

grepl retrieves the logical vector of the length of its arguments, while grep returns a shorter number vector, so it returns in this case.

+6
source

Source: https://habr.com/ru/post/1388129/


All Articles