R cannot convert NaN to NA

I have a data frame with several columns of factors containing NaN that I would like to convert to NA ( NaN problem for using linear regression objects to predict new data).

 > tester1 <- c("2", "2", "3", "4", "2", "3", NaN) > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" > tester1[is.nan(tester1)] = NA > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" > tester1[is.nan(tester1)] = "NA" > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" 
+4
source share
3 answers

Here's the problem: your vector is a symbol in mode, so of course it's not a number. This last element is interpreted as the string "NaN". Using is.nan will only make sense if the vector is numeric. If you want the value to be lost in the character vector (so that it is correctly processed by regression functions), use (without quotes), NA_character_ .

 > tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_) > tester1 [1] "2" "2" "3" "4" "2" "3" NA > is.na(tester1) [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE 

Neither "NA" nor "NaN" are truly absent in symbol vectors. If for some reason there were β€œNaN” values ​​in the factor variable, you could just use boolean indexing:

 tester1[tester1 == "NaN"] = "NA" # but that would not really be a missing value either # and it might screw up a factor variable anyway. tester1[tester1=="NaN"] <- "NA" Warning message: In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") : invalid factor level, NAs generated ########## tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN)) > tester1[tester1 =="NaN"] <- NA_character_ > tester1 [1] 2 2 3 4 2 3 <NA> Levels: 2 3 4 NaN 

This last result may be unexpected. There is a remaining β€œNaN” level, but none of the elements are β€œNaN”. Instead, the element that was "NaN" is now the real missing value, indicated in print as.

+12
source

You cannot have NaN in the character vector that you have here:

 > tester1 <- c("2", "2", "3", "4", "2", "3", NaN) > is.nan(tester1) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" 

Note that R thinks this is a character string.

You can create NaN in a number vector:

 > tester1 <- c("2", "2", "3", "4", "2", "3", NaN) > as.numeric(tester1) [1] 2 2 3 4 2 3 NaN > is.nan(as.numeric(tester1)) [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE 

Then, of course, R can convert NaN to NA according to your code:

 > foo <- as.numeric(tester1) > foo[is.nan(foo)] <- NA > foo [1] 2 2 3 4 2 3 NA 
+5
source

EDIT:

Gavin Simpson in the comments reminds me that in your situation there are much simpler ways to convert what really is β€œNaN” to β€œNA”:

 tester1 <- gsub("NaN", "NA", tester1) tester1 # [1] "2" "2" "3" "4" "2" "3" "NA" 

Decision:

To determine which elements of the character vector NaN , you need to convert the vector to a number vector:

 tester1[is.nan(as.numeric(tester1))] <- "NA" tester1 [1] "2" "2" "3" "4" "2" "3" "NA" 

Explanation:

There are several reasons why this does not work as you expect.

First, although NaN means "Not a Number", it has a class of "numeric" and only makes sense inside a number vector.

Secondly, when it is included in the character vector, the NaN character is silently converted to the character string "NaN" . When you then test it for NaN -ness, the character string returns FALSE :

 class(NaN) # [1] "numeric" c("1", NaN) # [1] "1" "NaN" is.nan(c("1", NaN)) # [1] FALSE FALSE 
+4
source

Source: https://habr.com/ru/post/1398703/


All Articles