Comparison of Boolean vectors

I have a data block with four logical vectors, v1 , v2 , v3 , v4 , which are TRUE or FALSE. I need to classify each row of data based on a combination of Boolean vectors (for example, "No" , "v1 only" , "v1 and v3" , "All" , etc.). I would like to do this without taking a subset of the data frame or nested ifelse statements. Any suggestions on the best way to do this? Thanks!

+4
source share
4 answers

Looks like I'm late for this party. However, I could also share what I brought!

This works by looking at the FALSE/TRUE possibilities as bits and working on them to assign each combination of v1 , v2 and v3 unique integer from 1 to 8 (which is very similar to chmod can be represented by permission bits on *NIX systems). Then the integer is used as an index to select the corresponding element of the text descriptor vector.

(For demonstration, I used only three columns, but this approach scales well.)

 # CONSTRUCT VECTOR OF DESCRIPTIONS description <- c("None", "v1", "v2", "v1 and v2", "v3", "v1 and v3", "v2 and v3", "All") # DEFINE DESCRIPTION FUNCTION getDescription <- function(X) { index <- 1 + sum(X*c(1,2,4)) description[index] } # TRY IT OUT ON ALL COMBOS OF v1, v2, and v3 df <- expand.grid(v1=c(FALSE, TRUE), v2=c(FALSE, TRUE), v3=c(FALSE, TRUE)) df$description <- apply(df, 1, getDescription) # YEP, IT WORKS. df # v1 v2 v3 description # 1 FALSE FALSE FALSE None # 2 TRUE FALSE FALSE v1 # 3 FALSE TRUE FALSE v2 # 4 TRUE TRUE FALSE v1 and v2 # 5 FALSE FALSE TRUE v3 # 6 TRUE FALSE TRUE v1 and v3 # 7 FALSE TRUE TRUE v2 and v3 # 8 TRUE TRUE TRUE All 
+3
source

Here is one approach based on the fact that TRUE / FALSE can be represented as 0s and 1s. You can multiply boolean values ​​by their column index, and then insert all the values ​​together. This will tell you which columns had a value of 1 for each row. Here is an example:

 set.seed(1) dat <- data.frame(v1 = sample(c(T,F), 10, TRUE), v2 = sample(c(T,F), 10, TRUE), v3 = sample(c(T,F), 10, TRUE), v4 = sample(c(T,F), 10, TRUE) ) #End fake data #Multiple T/F times the column index dat <- dat * rep(seq_len(ncol(dat)), each = nrow(dat)) #Paste together in a new column dat$v5 <- apply(dat, 1, function(x) paste(x, collapse = "")) > dat v1 v2 v3 v4 v5 1 0 0 3 4 0034 2 0 2 0 4 0204 ... 

Including helpful comments below and an additional question

I would create a lookup table using expand.grid() and then write text labels to represent them as you see fit. Here is an example with two columns:

 set.seed(1) dat <- data.frame(v1 = sample(c(T,F), 10, TRUE), v2 = sample(c(T,F), 10, TRUE) ) #Thanks @Joshua dat$comp <- as.character(apply(1 * dat, 1, paste, collapse="")) #Look up table lookup <- data.frame(comp = apply(expand.grid(0:1, 0:1), 1, paste, collapse = ""), text = c("none", "v1 only", "v2 only", "all"), stringsAsFactors = FALSE ) #Use merge to join the look up table to your data. Note the consistent naming of the comp column > merge(dat, lookup) comp v1 v2 text 1 00 FALSE FALSE none 2 00 FALSE FALSE none 3 01 FALSE TRUE v2 only .... 
+3
source

Let me throw my hat in the ring as well

 plyr::adply(dat, 1, function(x) paste(names(Filter(isTRUE, x)), collapse = " and ")) v1 v2 v3 v4 V1 1 TRUE TRUE FALSE TRUE v1 and v2 and v4 2 TRUE TRUE TRUE FALSE v1 and v2 and v3 3 FALSE FALSE FALSE TRUE v4 4 FALSE TRUE TRUE TRUE v2 and v3 and v4 5 TRUE FALSE TRUE FALSE v1 and v3 6 FALSE TRUE TRUE FALSE v2 and v3 7 FALSE FALSE TRUE FALSE v3 8 FALSE FALSE TRUE TRUE v3 and v4 9 FALSE TRUE FALSE FALSE v2 10 TRUE FALSE TRUE TRUE v1 and v3 and v4 
+2
source
  set.seed(123) > dat <- data.frame(v1 = sample(c(T,F), 10, TRUE), + v2 = sample(c(T,F), 10, TRUE), + v3 = sample(c(T,F), 10, TRUE), + v4 = sample(c(T,F), 10, TRUE) + ) > dat 

The first strategy uses a different combination of patterns for indexing into a character vector with a default value of 1 for the Other index:

 > dat$bcateg <- c("Other", "v2 only", "v1 and v3", "All")[1+ + with(dat, 1*(v2 & !v1 &!v3 &!v4)) + +with(dat, 2*(v1&v3))+ + with(dat, v1&v2&v3&v4)] > dat v1 v2 v3 v4 bcateg 1 TRUE FALSE FALSE FALSE Other 2 FALSE TRUE FALSE FALSE v2 only 3 TRUE FALSE FALSE FALSE Other 4 FALSE FALSE FALSE FALSE Other 5 FALSE TRUE FALSE TRUE Other 6 TRUE FALSE FALSE TRUE Other 7 FALSE TRUE FALSE FALSE v2 only 8 FALSE TRUE FALSE TRUE Other 9 FALSE TRUE TRUE TRUE Other 10 TRUE FALSE TRUE TRUE v1 and v3 

The second strategy combines TRUE column names using the delimiter ",":

 > dat$bcateg2 <-paste( c("","v1")[dat[["v1"]]+1 ], c("","v2")[dat[["v2"]]+1 ], c("","v3")[dat[["v3"]]+1 ], c("","v4")[dat[["v4"]]+1 ], sep = ",") > dat v1 v2 v3 v4 bcateg bcateg2 1 TRUE FALSE FALSE FALSE Other v1,,, 2 FALSE TRUE FALSE FALSE v2 only ,v2,, 3 TRUE FALSE FALSE FALSE Other v1,,, 4 FALSE FALSE FALSE FALSE Other ,,, 5 FALSE TRUE FALSE TRUE Other ,v2,,v4 6 TRUE FALSE FALSE TRUE Other v1,,,v4 7 FALSE TRUE FALSE FALSE v2 only ,v2,, 8 FALSE TRUE FALSE TRUE Other ,v2,,v4 9 FALSE TRUE TRUE TRUE Other ,v2,v3,v4 10 TRUE FALSE TRUE TRUE v1 and v3 v1,,v3,v4 
+1
source

Source: https://habr.com/ru/post/1386225/


All Articles