Comparison of Boolean vectors

Question

Comparison of Boolean vectors

I have a data block with four logical vectors, v1 , v2 , v3 , v4 , which are TRUE or FALSE. I need to classify each row of data based on a combination of Boolean vectors (for example, "No" , "v1 only" , "v1 and v3" , "All" , etc.). I would like to do this without taking a subset of the data frame or nested ifelse statements. Any suggestions on the best way to do this? Thanks!

+4

r boolean

Boom shakalaka Dec 14 '11 at 2:23

source share

4 answers

Here is one approach based on the fact that TRUE / FALSE can be represented as 0s and 1s. You can multiply boolean values by their column index, and then insert all the values together. This will tell you which columns had a value of 1 for each row. Here is an example:

 set.seed(1) dat <- data.frame(v1 = sample(c(T,F), 10, TRUE), v2 = sample(c(T,F), 10, TRUE), v3 = sample(c(T,F), 10, TRUE), v4 = sample(c(T,F), 10, TRUE) ) #End fake data #Multiple T/F times the column index dat <- dat * rep(seq_len(ncol(dat)), each = nrow(dat)) #Paste together in a new column dat$v5 <- apply(dat, 1, function(x) paste(x, collapse = "")) > dat v1 v2 v3 v4 v5 1 0 0 3 4 0034 2 0 2 0 4 0204 ...

Including helpful comments below and an additional question

I would create a lookup table using expand.grid() and then write text labels to represent them as you see fit. Here is an example with two columns:

 set.seed(1) dat <- data.frame(v1 = sample(c(T,F), 10, TRUE), v2 = sample(c(T,F), 10, TRUE) ) #Thanks @Joshua dat$comp <- as.character(apply(1 * dat, 1, paste, collapse="")) #Look up table lookup <- data.frame(comp = apply(expand.grid(0:1, 0:1), 1, paste, collapse = ""), text = c("none", "v1 only", "v2 only", "all"), stringsAsFactors = FALSE ) #Use merge to join the look up table to your data. Note the consistent naming of the comp column > merge(dat, lookup) comp v1 v2 text 1 00 FALSE FALSE none 2 00 FALSE FALSE none 3 01 FALSE TRUE v2 only ....

+3

Chase Dec 14 '11 at 2:43

source share

Let me throw my hat in the ring as well

 plyr::adply(dat, 1, function(x) paste(names(Filter(isTRUE, x)), collapse = " and ")) v1 v2 v3 v4 V1 1 TRUE TRUE FALSE TRUE v1 and v2 and v4 2 TRUE TRUE TRUE FALSE v1 and v2 and v3 3 FALSE FALSE FALSE TRUE v4 4 FALSE TRUE TRUE TRUE v2 and v3 and v4 5 TRUE FALSE TRUE FALSE v1 and v3 6 FALSE TRUE TRUE FALSE v2 and v3 7 FALSE FALSE TRUE FALSE v3 8 FALSE FALSE TRUE TRUE v3 and v4 9 FALSE TRUE FALSE FALSE v2 10 TRUE FALSE TRUE TRUE v1 and v3 and v4

+2

Ramnath Dec 14 '11 at 5:22

source share

  set.seed(123) > dat <- data.frame(v1 = sample(c(T,F), 10, TRUE), + v2 = sample(c(T,F), 10, TRUE), + v3 = sample(c(T,F), 10, TRUE), + v4 = sample(c(T,F), 10, TRUE) + ) > dat

The first strategy uses a different combination of patterns for indexing into a character vector with a default value of 1 for the Other index:

 > dat$bcateg <- c("Other", "v2 only", "v1 and v3", "All")[1+ + with(dat, 1*(v2 & !v1 &!v3 &!v4)) + +with(dat, 2*(v1&v3))+ + with(dat, v1&v2&v3&v4)] > dat v1 v2 v3 v4 bcateg 1 TRUE FALSE FALSE FALSE Other 2 FALSE TRUE FALSE FALSE v2 only 3 TRUE FALSE FALSE FALSE Other 4 FALSE FALSE FALSE FALSE Other 5 FALSE TRUE FALSE TRUE Other 6 TRUE FALSE FALSE TRUE Other 7 FALSE TRUE FALSE FALSE v2 only 8 FALSE TRUE FALSE TRUE Other 9 FALSE TRUE TRUE TRUE Other 10 TRUE FALSE TRUE TRUE v1 and v3

The second strategy combines TRUE column names using the delimiter ",":

 > dat$bcateg2 <-paste( c("","v1")[dat[["v1"]]+1 ], c("","v2")[dat[["v2"]]+1 ], c("","v3")[dat[["v3"]]+1 ], c("","v4")[dat[["v4"]]+1 ], sep = ",") > dat v1 v2 v3 v4 bcateg bcateg2 1 TRUE FALSE FALSE FALSE Other v1,,, 2 FALSE TRUE FALSE FALSE v2 only ,v2,, 3 TRUE FALSE FALSE FALSE Other v1,,, 4 FALSE FALSE FALSE FALSE Other ,,, 5 FALSE TRUE FALSE TRUE Other ,v2,,v4 6 TRUE FALSE FALSE TRUE Other v1,,,v4 7 FALSE TRUE FALSE FALSE v2 only ,v2,, 8 FALSE TRUE FALSE TRUE Other ,v2,,v4 9 FALSE TRUE TRUE TRUE Other ,v2,v3,v4 10 TRUE FALSE TRUE TRUE v1 and v3 v1,,v3,v4

+1

42- Dec 14 '11 at 4:17

source share

Josh o'brien · Accepted Answer · 2011-12-14T04:52:34+0000

Looks like I'm late for this party. However, I could also share what I brought!

This works by looking at the FALSE/TRUE possibilities as bits and working on them to assign each combination of v1 , v2 and v3 unique integer from 1 to 8 (which is very similar to chmod can be represented by permission bits on *NIX systems). Then the integer is used as an index to select the corresponding element of the text descriptor vector.

(For demonstration, I used only three columns, but this approach scales well.)

 # CONSTRUCT VECTOR OF DESCRIPTIONS description <- c("None", "v1", "v2", "v1 and v2", "v3", "v1 and v3", "v2 and v3", "All") # DEFINE DESCRIPTION FUNCTION getDescription <- function(X) { index <- 1 + sum(X*c(1,2,4)) description[index] } # TRY IT OUT ON ALL COMBOS OF v1, v2, and v3 df <- expand.grid(v1=c(FALSE, TRUE), v2=c(FALSE, TRUE), v3=c(FALSE, TRUE)) df$description <- apply(df, 1, getDescription) # YEP, IT WORKS. df # v1 v2 v3 description # 1 FALSE FALSE FALSE None # 2 TRUE FALSE FALSE v1 # 3 FALSE TRUE FALSE v2 # 4 TRUE TRUE FALSE v1 and v2 # 5 FALSE FALSE TRUE v3 # 6 TRUE FALSE TRUE v1 and v3 # 7 FALSE TRUE TRUE v2 and v3 # 8 TRUE TRUE TRUE All

Comparison of Boolean vectors

More articles: