R cannot convert NaN to NA

Question

R cannot convert NaN to NA

I have a data frame with several columns of factors containing NaN that I would like to convert to NA ( NaN problem for using linear regression objects to predict new data).

 > tester1 <- c("2", "2", "3", "4", "2", "3", NaN) > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" > tester1[is.nan(tester1)] = NA > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN" > tester1[is.nan(tester1)] = "NA" > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN"

+4

r nan na

screechOwl Feb 27 '12 at 10:07

source share

3 answers

You cannot have NaN in the character vector that you have here:

 > tester1 <- c("2", "2", "3", "4", "2", "3", NaN) > is.nan(tester1) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE > tester1 [1] "2" "2" "3" "4" "2" "3" "NaN"

Note that R thinks this is a character string.

You can create NaN in a number vector:

 > tester1 <- c("2", "2", "3", "4", "2", "3", NaN) > as.numeric(tester1) [1] 2 2 3 4 2 3 NaN > is.nan(as.numeric(tester1)) [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE

Then, of course, R can convert NaN to NA according to your code:

 > foo <- as.numeric(tester1) > foo[is.nan(foo)] <- NA > foo [1] 2 2 3 4 2 3 NA

+5

Gavin simpson Feb 27 '12 at 22:21

source share

EDIT:

Gavin Simpson in the comments reminds me that in your situation there are much simpler ways to convert what really is “NaN” to “NA”:

 tester1 <- gsub("NaN", "NA", tester1) tester1 # [1] "2" "2" "3" "4" "2" "3" "NA"

Decision:

To determine which elements of the character vector NaN , you need to convert the vector to a number vector:

 tester1[is.nan(as.numeric(tester1))] <- "NA" tester1 [1] "2" "2" "3" "4" "2" "3" "NA"

Explanation:

There are several reasons why this does not work as you expect.

First, although NaN means "Not a Number", it has a class of "numeric" and only makes sense inside a number vector.

Secondly, when it is included in the character vector, the NaN character is silently converted to the character string "NaN" . When you then test it for NaN -ness, the character string returns FALSE :

 class(NaN) # [1] "numeric" c("1", NaN) # [1] "1" "NaN" is.nan(c("1", NaN)) # [1] FALSE FALSE

+4

Josh o'brien Feb 27 '12 at 22:12

source share

42- · Accepted Answer · 2012-02-27T22:17:50+0000

Here's the problem: your vector is a symbol in mode, so of course it's not a number. This last element is interpreted as the string "NaN". Using is.nan will only make sense if the vector is numeric. If you want the value to be lost in the character vector (so that it is correctly processed by regression functions), use (without quotes), NA_character_ .

 > tester1 <- c("2", "2", "3", "4", "2", "3", NA_character_) > tester1 [1] "2" "2" "3" "4" "2" "3" NA > is.na(tester1) [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE

Neither "NA" nor "NaN" are truly absent in symbol vectors. If for some reason there were “NaN” values in the factor variable, you could just use boolean indexing:

 tester1[tester1 == "NaN"] = "NA" # but that would not really be a missing value either # and it might screw up a factor variable anyway. tester1[tester1=="NaN"] <- "NA" Warning message: In `[<-.factor`(`*tmp*`, tester1 == "NaN", value = "NA") : invalid factor level, NAs generated ########## tester1 <- factor(c("2", "2", "3", "4", "2", "3", NaN)) > tester1[tester1 =="NaN"] <- NA_character_ > tester1 [1] 2 2 3 4 2 3 <NA> Levels: 2 3 4 NaN

This last result may be unexpected. There is a remaining “NaN” level, but none of the elements are “NaN”. Instead, the element that was "NaN" is now the real missing value, indicated in print as.

R cannot convert NaN to NA

More articles: