First your data:
c <- c("9", "286593", "C", "C/C", "C/A", "A/A") # Note: In your original data, you had a space in "G/A", which I did remove. # If this was no mistake, we would also have to deal with the space. d <- c("9", "334337", "A", "A/A", "G/A", "A/A") e <- c("9", "390512", "C", "C/C", "C/C", "C/C") dat <- data.frame(rbind(c,d,e))
Now we will generate a vector containing all possible letters.
values <- c("A", "C", "G", "T") dat$X3 <- factor(dat$X3, levels=values)
The main function finds the correct columns of each combination of each column, and then compares it with reference column 3.
compare <- function(col, val) { m <- match(col, combinations$v) 2 - (combinations$f[m] == val) - (combinations$s[m] == val) }
Finally, we use apply to run the function for all columns that need to be changed. You probably want to change 6 to the actual number of columns.
dat[,4:6] <- apply(dat[,4:6], 2, compare, val=dat[,3])
Please note that this solution, compared to other solutions, still does not use string comparison, but an approach based solely on factors. It would be interesting to see which one works best.
Edit
I just did benchmarking:
test replications elapsed relative user.self sys.self user.child sys.child 1 arun 1000000 2.881 1.116 2.864 0.024 0 0 2 fabio 1000000 2.593 1.005 2.558 0.030 0 0 3 roland 1000000 2.727 1.057 2.687 0.048 0 0 5 thilo 1000000 2.581 1.000 2.540 0.036 0 0 4 tyler 1000000 2.663 1.032 2.626 0.042 0 0
which leaves my version a little faster. However, the difference means almost nothing, so you are probably well versed in each approach. And honestly, I did not compare the part where I add additional levels of factors. Doing this will also probably print my version.