R - convert various dummy / logical variables into one categorical variable / coefficient on their behalf

My question bears a strong resemblance to this and this other , but my dataset is a bit bit, and I cannot get these solutions to work. Excuse me if I misunderstood something, and this question is superfluous.

I have a dataset like this:

df <- data.frame( id = c(1:5), conditionA = c(1, NA, NA, NA, 1), conditionB = c(NA, 1, NA, NA, NA), conditionC = c(NA, NA, 1, NA, NA), conditionD = c(NA, NA, NA, 1, NA) ) # id conditionA conditionB conditionC conditionD # 1 1 1 NA NA NA # 2 2 NA 1 NA NA # 3 3 NA NA 1 NA # 4 4 NA NA NA 1 # 5 5 1 NA NA NA 

(Note: besides these columns, I have many other columns that should not be affected by the current manipulation.)

So, I noticed that conditionA , conditionB , conditionC and conditionD are mutually exclusive and should be better represented as one categorical variable, i.e. factor , which should look like this:

 # id type # 1 1 conditionA # 2 2 conditionB # 3 3 conditionC # 4 4 conditionD # 5 5 conditionA 

I explored using gather or unite from tidyr , but this is not the case (with unite , we lose information from the variable name).

I tried to use kimisc::coalescence.na , as suggested in the first answer, but 1. First I need to set the coefficient value based on the name for each column, 2. it does not work as expected, the first column:

 library(kimisc) # first, factor each condition with a specific label df$conditionA <- df$conditionA %>% factor(levels = 1, labels = "conditionA") df$conditionB <- df$conditionB %>% factor(levels = 1, labels = "conditionB") df$conditionC <- df$conditionC %>% factor(levels = 1, labels = "conditionC") df$conditionD <- df$conditionD %>% factor(levels = 1, labels = "conditionD") # now coalesce.na to merge into a single variable df$type <- coalesce.na(df$conditionA, df$conditionB, df$conditionC, df$conditionD) df # id conditionA conditionB conditionC conditionD type # 1 1 conditionA <NA> <NA> <NA> conditionA # 2 2 <NA> conditionB <NA> <NA> <NA> # 3 3 <NA> <NA> conditionC <NA> <NA> # 4 4 <NA> <NA> <NA> conditionD <NA> # 5 5 conditionA <NA> <NA> <NA> conditionA 

I tried other suggestions from the second question, but did not find what would bring me the expected result ...

+6
source share
3 answers

You can also try:

 colnames(df)[2:5][max.col(!is.na(df[,2:5]))] #[1] "conditionA" "conditionB" "conditionC" "conditionD" "conditionA" 

The above works if for each row there is one and only one column other than NA . If the string values ​​can be all NA s, you can try:

 mat<-!is.na(df[,2:5]) colnames(df)[2:5][max.col(mat)*(NA^!rowSums(mat))] 
+4
source

Try:

 library(dplyr) library(tidyr) df %>% gather(type, value, -id) %>% na.omit() %>% select(-value) %>% arrange(id) 

What gives:

 # id type #1 1 conditionA #2 2 conditionB #3 3 conditionC #4 4 conditionD #5 5 conditionA 

Update

To handle the case described in detail in the comments, you can perform the operation on the desired part of the data frame, and then left_join() other columns:

 df %>% select(starts_with("condition"), id) %>% gather(type, value, -id) %>% na.omit() %>% select(-value) %>% left_join(., df %>% select(-starts_with("condition"))) %>% arrange(id) 
+6
source
 library(tidyr) library(dplyr) df <- df %>% gather(type, count, -id) df <- df[complete.cases(df),][,-3] df[order(df$id),] id type 1 1 conditionA 7 2 conditionB 13 3 conditionC 19 4 conditionD 5 5 conditionA 
+1
source

Source: https://habr.com/ru/post/987518/


All Articles