Handling Null Values

I import the csv file into R using the sqldf-package. I have several missing values ​​for both numeric and string variables. I notice that the missing values ​​remain empty in the data frame (as opposed to filling in NA or something else). I want to replace the missing values ​​with a user-defined value. Obviously, a function like this is.na()will not work in this case.

Three-column toy text frame:

A  B  C  
3  4  
2  4  6   
34 23 43   
2  5   

I want to:

A  B  C  
3  4  NA  
2  4  6   
34 23 43   
2  5  NA 

Thanks in advance.

+3
source share
1 answer

Assuming that you are using read.csv.sqlin sqldfwith the default database sqlite, it creates factor columns for C, therefore

(1) , as.numeric(as.character(...)) :

> Lines <- "A,B,C
+ 3,4,
+ 2,4,6
+ 34,23,43
+ 2,5,
+ "
> cat(Lines, file = "stest.csv")
> library(sqldf)
> DF <- read.csv.sql("stest.csv")
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: Factor w/ 3 levels "","43","6": 1 3 2 1
> DF$C <- as.numeric(as.character(DF$C))
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: num  NA 6 43 NA

(2), sqldf(..., method = "raw"), as.numeric:

> DF <- read.csv.sql("stest.csv", method = "raw")
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: chr  "" "6" "43" ""
> DF$C <- as.numeric(DF$C)
> str(DF)
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: num  NA 6 43 NA

(3) read.csv, NA:

> str(read.csv("stest.csv"))
'data.frame':   4 obs. of  3 variables:
 $ A: int  3 2 34 2
 $ B: int  4 4 23 5
 $ C: int  NA 6 43 NA
+4

Source: https://habr.com/ru/post/1763278/


All Articles