I work in R and have a DataFrame, dd_2006, with number vectors. When I first imported the data, I needed to remove $, decimal points and some spaces from 3 of my variables: SumOfCost, SumOfCases and SumOfUnits. For this, I used str_replace_all . However, as soon as I used str_replace_all , the vectors were converted to characters. Therefore, I used as.numeric (var) to convert vectors to numeric, but NA was introduced, although when I ran the code below BEFORE, I ran as.numeric code, there was no NA in the vectors.
sum(is.na(dd_2006$SumOfCost)) [1] 0 sum(is.na(dd_2006$SumOfCases)) [1] 0 sum(is.na(dd_2006$SumOfUnits)) [1] 0
Here is my code after import, starting with removing $ from the vector. In the output of str(dd_2006) I deleted some of the variables for the sake of space, so the #s column in the str_replace_all code below does not match the output I posted here (but they do in the source code):
library("stringr") dd_2006$SumOfCost <- str_sub(dd_2006$SumOfCost, 2, ) #2=the first # after the $ #Removes decimal pt, zero after, and commas dd_2006[ ,9] <- str_replace_all(dd_2006[ ,9], ".00", "") dd_2006[,9] <- str_replace_all(dd_2006[,9], ",", "") dd_2006[ ,10] <- str_replace_all(dd_2006[ ,10], ".00", "") dd_2006[ ,10] <- str_replace_all(dd_2006[,10], ",", "") dd_2006[ ,11] <- str_replace_all(dd_2006[ ,11], ".00", "") dd_2006[,11] <- str_replace_all(dd_2006[,11], ",", "") str(dd_2006) 'data.frame': 12604 obs. of 14 variables: $ CMHSP : Factor w/ 46 levels "Allegan","AuSable Valley",..: 1 1 1 $ FY : Factor w/ 1 level "2006": 1 1 1 1 1 1 1 1 1 1 ... $ Population : Factor w/ 1 level "DD": 1 1 1 1 1 1 1 1 1 1 ... $ SumOfCases : chr "0" "1" "0" "0" ... $ SumOfUnits : chr "0" "365" "0" "0" ... $ SumOfCost : chr "0" "96416" "0" "0" ...
I found an answer to a similar question to mine here using the following code:
# create dummy data.frame d <- data.frame(char = letters[1:5], fake_char = as.character(1:5), fac = factor(1:5), char_fac = factor(letters[1:5]), num = 1:5, stringsAsFactors = FALSE)
Let's take a look at data.frame
> d char fake_char fac char_fac num 1 a 1 1 a 1 2 b 2 2 b 2 3 c 3 3 c 3 4 d 4 4 d 4 5 e 5 5 e 5
and run:
> sapply(d, mode) char fake_char fac char_fac num "character" "character" "numeric" "numeric" "numeric" > sapply(d, class) char fake_char fac char_fac num "character" "character" "factor" "factor" "integer"
Now you are probably asking yourself: "Where is the anomaly?" Well, I came across very peculiar things in R, and this is not the most unpleasant thing, but it can confuse you, especially if you read this before climbing into bed.
Here: the first two columns are characters. I deliberately called the second fake_char. Identify the similarity of this character variable to what Dirk created in his answer. This is actually a numerical vector converted to character. The third and fourth columns are factors, and the last is purely numerical.
If you use the conversion function, you can convert fake_char to numeric, but not to char variable.
> transform(d, char = as.numeric(char)) char fake_char fac char_fac num 1 NA 1 1 a 1 2 NA 2 2 b 2 3 NA 3 3 c 3 4 NA 4 4 d 4 5 NA 5 5 e 5 Warning message: In eval(expr, envir, enclos) : NAs introduced by coercion but if you do same thing on fake_char and char_fac, you'll be lucky, and get away with no NA's:
transform (d, fake_char = as.numeric (fake_char), char_fac = as.numeric (char_fac))
char fake_char fac char_fac num 1 a 1 1 1 1 2 b 2 2 2 2 3 c 3 3 3 3 4 d 4 4 4 4 5 e 5 5 5 5
So, I tried the above code in my script, but still came up with NA (without warning about forcing).
#changing sumofcases, cost, and units to numeric dd_2006_1 <- transform(dd_2006, SumOfCases = as.numeric(SumOfCases), SumOfUnits = as.numeric(SumOfUnits), SumOfCost = as.numeric(SumOfCost)) > sum(is.na(dd_2006_1$SumOfCost)) [1] 12 > sum(is.na(dd_2006_1$SumOfCases)) [1] 7 > sum(is.na(dd_2006_1$SumOfUnits)) [1] 11
I also used table(dd_2006$SumOfCases) , etc., to look at the observations, to see if there are any characters that I missed in the observations, but there were none. Any thoughts on why NS appear, and how to get rid of them?