In short: I'm looking for a general way to fill in missing values in merge(..., all = TRUE, ...) constant other than NA .
Let's pretend that
z <- merge(x, y, all = TRUE, ...)
... and that I want all missing values in z (due to the lack of keys in x or y ) to be filled with the (non NA ) FILL_VALUE .
First, a simple case:
FILL_VALUE <- "-" x <- data.frame(K=1001:1005, I=3:7, R=c(0.1, 0.2, 0.3, 0.4, 0.5), B=c(TRUE, FALSE, TRUE, FALSE, TRUE), C=c(0.1+0.2i, 0.3+0.4i, 0.5+0.6i, 0.7+0.8i, 0.9+1.0i)) y <- data.frame(K=1001:1003, S1=c("a", "b", "c"), S2=c("d", "e", "f"), stringsAsFactors = FALSE) z <- merge(x, y, all = TRUE, by = "K") ## > z ## KIRBC S1 S2 ## 1 1001 3 0.1 TRUE 0.1+0.2iad ## 2 1002 4 0.2 FALSE 0.3+0.4ibe ## 3 1003 5 0.3 TRUE 0.5+0.6icf ## 4 1004 6 0.4 FALSE 0.7+0.8i <NA> <NA> ## 5 1005 7 0.5 TRUE 0.9+1.0i <NA> <NA>
In this case, the only NA elements as a result are those introduced by merge , so the following task:
z[is.na(z)] <- FILL_VALUE ## > z ## KIRBC S1 S2 ## 1 1001 3 0.1 TRUE 0.1+0.2iad ## 2 1002 4 0.2 FALSE 0.3+0.4ibe ## 3 1003 5 0.3 TRUE 0.5+0.6icf ## 4 1004 6 0.4 FALSE 0.7+0.8i - - ## 5 1005 7 0.5 TRUE 0.9+1.0i - -
Now is the case when this solution fails.
xna <- data.frame(K=1001:1005, I=c(NA, 4:7), R=c(0.1, NA, 0.3, 0.4, 0.5), B=c(TRUE, FALSE, NA, FALSE, TRUE), C=c(0.1+0.2i, 0.3+0.4i, 0.5+0.6i, NA, 0.9+1.0i)) yna <- data.frame(K=1001:1003, S1=c(NA, "b", "c"), S2=c("d", NA, "f"), stringsAsFactors = FALSE) zna <- merge(xna, yna, all = TRUE, by = "K") ## > zna ## KIRBC S1 S2 ## 1 1001 NA 0.1 TRUE 0.1+0.2i <NA> d ## 2 1002 4 NA FALSE 0.3+0.4ib <NA> ## 3 1003 5 0.3 NA 0.5+0.6icf ## 4 1004 6 0.4 FALSE NA <NA> <NA> ## 5 1005 7 0.5 TRUE 0.9+1.0i <NA> <NA>
The desired value for zna is a value in which the NA values entered by merge are replaced with FILL_VALUE ; IOW:
## > zna ## KIRBC S1 S2 ## 1 1001 NA 0.1 TRUE 0.1+0.2i <NA> d ## 2 1002 4 NA FALSE 0.3+0.4ib <NA> ## 3 1003 5 0.3 NA 0.5+0.6icf ## 4 1004 6 0.4 FALSE NA - - ## 5 1005 7 0.5 TRUE 0.9+1.0i - -
Therefore, this will not do:
zna[is.na(zna)] <- FILL_VALUE ## > zna ## KIRBC S1 S2 ## 1 1001 - 0.1 TRUE 0.1+0.2i - d ## 2 1002 4 - FALSE 0.3+0.4ib - ## 3 1003 5 0.3 - 0.5+0.6icf ## 4 1004 6 0.4 FALSE - - - ## 5 1005 7 0.5 TRUE 0.9+1i - -
Note that this assignment does a lot more than improperly replacing multiple "-" values; it also changes the types of several columns:
## > zna[, "I"] ## [1] "-" "4" "5" "6" "7" ## > zna[, "B"] ## [1] "TRUE" "FALSE" "-" "FALSE" "TRUE" ## > zna[, "R"] ## [1] "0.1" "-" "0.3" "0.4" "0.5" ## > zna[, "C"] ## [1] "0.1+0.2i" "0.3+0.4i" "0.5+0.6i" "-" "0.9+1i"