Replace value with consecutive column for multiple columns r

Question

Replace value with consecutive column for multiple columns r

I have a dataframe like this:

     Q1  Q1a    Q2    Q2a   Q2b
1   foo <NA>    fee   <NA>  <NA>
2   bar <NA>  other    ree  <NA>
3 other  roo    bee   <NA>  <NA>
4   bar <NA>    fee   <NA>  <NA>
5   bar <NA>  other    fee  <NA>
6 other  fee  other    <NA>  roo

I would like to replace any occurrence of the “other” with a value from a consecutive column on the same row (ie Qx a) so that I can get rid of sparse columns:

     Q1    Q2  
1   foo   fee  
2   bar   ree
3   roo   bee 
4   bar   fee 
5   bar   fee
6   fee   roo

I could imagine how to do this for a single column:

Lines = "
Q1  Q1a    Q2    Q2a   Q2b
foo NA fee  NA  NA
bar NA other  ree  NA
other roo  bee   NA  NA
bar NA    fee   NA  NA
bar NA    other fee  NA
other  fee  other NA  roo"

df = read.table(text = Lines, header = TRUE, as.is = TRUE)

df$Q1[df$Q1=='other'] = df$Q1a[df$Q1=='other']

and then a loop for each column, but it's a little tedious and slow with lots of columns and multiple values in each column.

Is there an easier way to do this? (In addition, my method will not extend in detail to the Q2a Q2b example.)

+4

r

TMrtSmith Dec 04 '17 at 15:54

source share

2 answers

, Q3a-Q3f, ...

library(tidyverse)

new_df <- df %>%
    mutate(Q1 = ifelse(Q1 == 'other', Q1a, Q1),
           Q2 = ifelse(Q2 == 'other', 
                       ifelse(!is.na(Q2a), Q2a, Q2b))) %>%
    select(Q1, Q2)

+1

Matt W. 04 . '17 16:38

akrun · Accepted Answer · 2017-12-04T16:10:53+0000

split , list of data.frame , "" , max.col

data.frame(lapply(split.default(df, sub("[a-z]+$", "", names(df))), function(x) {
          i1 <- x[,1] == "other"
          i2 <- x[-1] != "other" & !is.na(x[-1])
          x[,1][i1] <- x[-1][cbind(1:nrow(i2), (max.col(i2))* i1)]
       x[,1]
  }))
#   Q1  Q2
#1 foo fee
#2 bar ree
#3 roo bee
#4 bar fee
#5 bar fee
#6 fee roo

tidyverse

library(dplyr)
library(purrr)
split.default(df, sub("[a-z]+$", "", names(df))) %>%
       map_df(~ replace(., .== 'other', NA) %>% 
                 do.call(paste, .) %>%
                 gsub("\\s*|(NA\\s*)+", "", .))
# A tibble: 6 x 2
#     Q1    Q2
#   <chr> <chr>
#1   foo   fee
#2   bar   ree
#3   roo   bee
#4   bar   fee
#5   bar   fee
#6   fee   roo

Replace value with consecutive column for multiple columns r

More articles: