How to fill in the missing values ​​from merge (..., all = TRUE, ...) with a value other than NA?

In short: I'm looking for a general way to fill in missing values ​​in merge(..., all = TRUE, ...) constant other than NA .


Let's pretend that

 z <- merge(x, y, all = TRUE, ...) 

... and that I want all missing values ​​in z (due to the lack of keys in x or y ) to be filled with the (non NA ) FILL_VALUE .


First, a simple case:

 FILL_VALUE <- "-" x <- data.frame(K=1001:1005, I=3:7, R=c(0.1, 0.2, 0.3, 0.4, 0.5), B=c(TRUE, FALSE, TRUE, FALSE, TRUE), C=c(0.1+0.2i, 0.3+0.4i, 0.5+0.6i, 0.7+0.8i, 0.9+1.0i)) y <- data.frame(K=1001:1003, S1=c("a", "b", "c"), S2=c("d", "e", "f"), stringsAsFactors = FALSE) z <- merge(x, y, all = TRUE, by = "K") ## > z ## KIRBC S1 S2 ## 1 1001 3 0.1 TRUE 0.1+0.2iad ## 2 1002 4 0.2 FALSE 0.3+0.4ibe ## 3 1003 5 0.3 TRUE 0.5+0.6icf ## 4 1004 6 0.4 FALSE 0.7+0.8i <NA> <NA> ## 5 1005 7 0.5 TRUE 0.9+1.0i <NA> <NA> 

In this case, the only NA elements as a result are those introduced by merge , so the following task:

 z[is.na(z)] <- FILL_VALUE ## > z ## KIRBC S1 S2 ## 1 1001 3 0.1 TRUE 0.1+0.2iad ## 2 1002 4 0.2 FALSE 0.3+0.4ibe ## 3 1003 5 0.3 TRUE 0.5+0.6icf ## 4 1004 6 0.4 FALSE 0.7+0.8i - - ## 5 1005 7 0.5 TRUE 0.9+1.0i - - 

Now is the case when this solution fails.

 xna <- data.frame(K=1001:1005, I=c(NA, 4:7), R=c(0.1, NA, 0.3, 0.4, 0.5), B=c(TRUE, FALSE, NA, FALSE, TRUE), C=c(0.1+0.2i, 0.3+0.4i, 0.5+0.6i, NA, 0.9+1.0i)) yna <- data.frame(K=1001:1003, S1=c(NA, "b", "c"), S2=c("d", NA, "f"), stringsAsFactors = FALSE) zna <- merge(xna, yna, all = TRUE, by = "K") ## > zna ## KIRBC S1 S2 ## 1 1001 NA 0.1 TRUE 0.1+0.2i <NA> d ## 2 1002 4 NA FALSE 0.3+0.4ib <NA> ## 3 1003 5 0.3 NA 0.5+0.6icf ## 4 1004 6 0.4 FALSE NA <NA> <NA> ## 5 1005 7 0.5 TRUE 0.9+1.0i <NA> <NA> 

The desired value for zna is a value in which the NA values ​​entered by merge are replaced with FILL_VALUE ; IOW:

 ## > zna ## KIRBC S1 S2 ## 1 1001 NA 0.1 TRUE 0.1+0.2i <NA> d ## 2 1002 4 NA FALSE 0.3+0.4ib <NA> ## 3 1003 5 0.3 NA 0.5+0.6icf ## 4 1004 6 0.4 FALSE NA - - ## 5 1005 7 0.5 TRUE 0.9+1.0i - - 

Therefore, this will not do:

 zna[is.na(zna)] <- FILL_VALUE ## > zna ## KIRBC S1 S2 ## 1 1001 - 0.1 TRUE 0.1+0.2i - d ## 2 1002 4 - FALSE 0.3+0.4ib - ## 3 1003 5 0.3 - 0.5+0.6icf ## 4 1004 6 0.4 FALSE - - - ## 5 1005 7 0.5 TRUE 0.9+1i - - 

Note that this assignment does a lot more than improperly replacing multiple "-" values; it also changes the types of several columns:

 ## > zna[, "I"] ## [1] "-" "4" "5" "6" "7" ## > zna[, "B"] ## [1] "TRUE" "FALSE" "-" "FALSE" "TRUE" ## > zna[, "R"] ## [1] "0.1" "-" "0.3" "0.4" "0.5" ## > zna[, "C"] ## [1] "0.1+0.2i" "0.3+0.4i" "0.5+0.6i" "-" "0.9+1i" 
+5
source share
1 answer

You can do the following:

 > FILL_VALUE <- "-" > > xna <- data.frame(K=1001:1005, + I=c(NA, 4:7), + R=c(0.1, NA, 0.3, 0.4, 0.5), + B=c(TRUE, FALSE, NA, FALSE, TRUE), + C=c(0.1+0.2i, 0.3+0.4i, 0.5+0.6i, NA, 0.9+1.0i)) > > yna <- data.frame(K=1001:1003, + S1=c(NA, "b", "c"), + S2=c("d", NA, "f"), + stringsAsFactors = FALSE) > > > # add bools > xna$has_xna <- TRUE > yna$has_yna <- TRUE > > # merge > zna <- merge(xna, yna, all = TRUE, by = "K") > zna KIRBC has_xna S1 S2 has_yna 1 1001 NA 0.1 TRUE 0.1+0.2i TRUE <NA> d TRUE 2 1002 4 NA FALSE 0.3+0.4i TRUE b <NA> TRUE 3 1003 5 0.3 NA 0.5+0.6i TRUE cf TRUE 4 1004 6 0.4 FALSE NA TRUE <NA> <NA> NA 5 1005 7 0.5 TRUE 0.9+1.0i TRUE <NA> <NA> NA > > # fill in for NAs due to merge > yna_cols <- colnames(zna) %in% colnames(yna) > zna[, yna_cols][is.na(zna[, yna_cols]) & is.na(zna$has_yna)] <- FILL_VALUE > zna$has_yna <- NULL # remove column > > # do the same for xna > xna_cols <- colnames(zna) %in% colnames(xna) > zna[, xna_cols][is.na(zna[, xna_cols]) & is.na(zna$has_xna)] <- FILL_VALUE > zna$has_yna <- NULL # remove column > > # Final results > zna KIRBC has_xna S1 S2 1 1001 NA 0.1 TRUE 0.1+0.2i TRUE <NA> d 2 1002 4 NA FALSE 0.3+0.4i TRUE b <NA> 3 1003 5 0.3 NA 0.5+0.6i TRUE cf 4 1004 6 0.4 FALSE NA TRUE - - 5 1005 7 0.5 TRUE 0.9+1.0i TRUE - - 

The above can easily be rewritten into a common merge shell. Another option is to use data.table with the nomatch and on arguments for the [.data.table function.

+1
source

Source: https://habr.com/ru/post/1267102/


All Articles