R get strings based on several conditions - use dplyr and reshape2

df <- data.frame( exp=c(1,1,2,2), name=c("gene1", "gene2", "gene1", "gene2"), value=c(1,1,3,-1) ) 

While trying to tune to dplyr and reshape2 I came across a โ€œsimpleโ€ way to select strings based on several conditions. If I want to have these genes (variable name ) that have value above 0 in experiment 1 ( exp == 1) and at the same time, value below 0 in experiment 2; in df it will be "gene2". Of course, there must be many ways to do this, for example. a subset of df for each set of conditions (exp == 1 and value> 0 and exp == 2 and value <0), and then append the results of this subset:

 library(dplyr) inner_join(filter(df,exp == 1 & value > 0),filter(df,exp == 2 & value < 0), by= c("name"="name"))[[1]] 

Although this works, it looks very uncomfortable, and I feel that such conditional filtering is at the heart of reshape2 and dplyr , but cannot figure out how to do this. Can someone enlighten me here?

+6
source share
4 answers

One option that comes to mind is to convert the data to a "wide" format and then filter.

Here is an example using "data.table" (for the convenience of compound statements):

 library(data.table) dcast.data.table(as.data.table(df), name ~ exp)[`1` > 0 & `2` < 0] # name 1 2 # 1: gene2 1 -1 

Similarly, with "dplyr" and "tidyr":

 library(dplyr) library(tidyr) df %>% spread(exp, value) %>% filter(`1` > 0 & `2` < 0) 
+16
source

Another dplyr option:

 group_by(df, name) %>% filter(value[exp == 1] > 0 & value[exp == 2] < 0) #Source: local data frame [2 x 3] #Groups: name # # exp name value #1 1 gene2 1 #2 2 gene2 -1 
+4
source

This is probably even more confusing than your own solution, but I think it has a "dplyr" feel:

 df %>% filter((exp == 1 & value > 0) | (exp == 2 & value < 0)) %>% group_by(name) %>% filter(length(unique(exp)) == 2) %>% select(name) %>% unique() #Source: local data frame [1 x 1] #Groups: name # name #1 gene2 
+1
source

filter allows you to use several parameters with a comma, the sames as select . Each additional condition: AND:

group_by(df, name) %>% filter(value[exp == 1] > 0, value[exp == 2] < 0)

From the official documentation: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

The given examples:

  • flights[flights$month == 1 & flights$day == 1, ] in the R database

  • filter(flights, month == 1, day == 1) in dplyr.

+1
source

Source: https://habr.com/ru/post/978947/


All Articles