Selecting columns based on row values ​​in multiple columns using dplyr

I am trying to select columns where at least one row is 1, only if the same row also has a specific value in the second column. I would prefer to achieve this with dplyr, but any computationally efficient solution is welcome.

Example:

Select columns between a1, a2, a3 containing at least one row, where the value is 1 And where column b == "B"

Sample data:

rand <- function(S) {set.seed(S); sample(x = c(0,1),size = 3, replace=T)}
df <- data.frame(a1=rand(1),a2=rand(2),a3=rand(3),b=c("A","B","A"))

Input data:

  a1 a2 a3 b
1  0  0  0 A
2  0  1  1 B
3  1  1  0 A

Required Conclusion:

  a2 a3
1  0  0
2  1  1
3  1  0

I managed to get the correct output using the following code, however this is a very inefficient solution, and I need to run it on a very large data frame (365,000 rows X 314 columns).

df %>% select_if(function(x) any(paste0(x,.$b) == '1B'))
+4
source share
2 answers

dplyr:

ids <- df %>% 
  reshape2::melt(id.vars = "b") %>% 
  filter(value == 1 & b == "B") %>% 
  select(variable)

df[,unlist(ids)]

#  a2 a3
#1  0  0
#2  1  1
#3  1  0

@docendo-discimus,

+2

, dplyr:

df[sapply(df[df$b == "B",], function(x) 1 %in% x)]
+2

Source: https://habr.com/ru/post/1690315/


All Articles