R get strings based on several conditions - use dplyr and reshape2

Question

R get strings based on several conditions - use dplyr and reshape2

df <- data.frame( exp=c(1,1,2,2), name=c("gene1", "gene2", "gene1", "gene2"), value=c(1,1,3,-1) )

While trying to tune to dplyr and reshape2 I came across a “simple” way to select strings based on several conditions. If I want to have these genes (variable name ) that have value above 0 in experiment 1 ( exp == 1) and at the same time, value below 0 in experiment 2; in df it will be "gene2". Of course, there must be many ways to do this, for example. a subset of df for each set of conditions (exp == 1 and value> 0 and exp == 2 and value <0), and then append the results of this subset:

 library(dplyr) inner_join(filter(df,exp == 1 & value > 0),filter(df,exp == 2 & value < 0), by= c("name"="name"))[[1]]

Although this works, it looks very uncomfortable, and I feel that such conditional filtering is at the heart of reshape2 and dplyr , but cannot figure out how to do this. Can someone enlighten me here?

+6

r conditional filtering dplyr reshape2

user3375672 Dec 01 '14 at 15:09

source share

4 answers

Another dplyr option:

 group_by(df, name) %>% filter(value[exp == 1] > 0 & value[exp == 2] < 0) #Source: local data frame [2 x 3] #Groups: name # # exp name value #1 1 gene2 1 #2 2 gene2 -1

+4

docendo discimus Dec 01 '14 at 15:42

source share

This is probably even more confusing than your own solution, but I think it has a "dplyr" feel:

 df %>% filter((exp == 1 & value > 0) | (exp == 2 & value < 0)) %>% group_by(name) %>% filter(length(unique(exp)) == 2) %>% select(name) %>% unique() #Source: local data frame [1 x 1] #Groups: name # name #1 gene2

+1

Francisco Rodríguez Algarra Dec 01 '14 at 15:27

source share

filter allows you to use several parameters with a comma, the sames as select . Each additional condition: AND:

group_by(df, name) %>% filter(value[exp == 1] > 0, value[exp == 2] < 0)

From the official documentation: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

The given examples:

flights[flights$month == 1 & flights$day == 1, ] in the R database
filter(flights, month == 1, day == 1) in dplyr.

+1

pablo_sci Jun 08 '17 at 21:16

source share

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2014-12-01T15:27:18+0000

One option that comes to mind is to convert the data to a "wide" format and then filter.

Here is an example using "data.table" (for the convenience of compound statements):

 library(data.table) dcast.data.table(as.data.table(df), name ~ exp)[`1` > 0 & `2` < 0] # name 1 2 # 1: gene2 1 -1

Similarly, with "dplyr" and "tidyr":

 library(dplyr) library(tidyr) df %>% spread(exp, value) %>% filter(`1` > 0 & `2` < 0)

R get strings based on several conditions - use dplyr and reshape2

More articles: