I do this all the time, and itβs a pain in the butt, so I wrote a method for it called sparsify () in my R package - mltools . It works on data.table
, which is just fancy data.frames
.
To solve your specific problem ...
Install mltools (or just copy sparsify () into your environment)
Download Packages
library(data.table) library(Matrix) library(mltools)
Sparsify
x <- data.table(x)
In general, the sparsify () method is quite flexible. Here are some examples of how you can use it:
Make some data. Pay attention to data types and unused factor levels.
dt <- data.table( intCol=c(1L, NA_integer_, 3L, 0L), realCol=c(NA, 2, NA, NA), logCol=c(TRUE, FALSE, TRUE, FALSE), ofCol=factor(c("a", "b", NA, "b"), levels=c("a", "b", "c"), ordered=TRUE), ufCol=factor(c("a", NA, "c", "b"), ordered=FALSE) ) > dt intCol realCol logCol ofCol ufCol 1: 1 NA TRUE aa 2: NA 2 FALSE b NA 3: 3 NA TRUE NA c 4: 0 NA FALSE bb
Out-of-Box Use
> sparsify(dt) 4 x 7 sparse Matrix of class "dgCMatrix" intCol realCol logCol ofCol ufCol_a ufCol_b ufCol_c [1,] 1 NA 1 1 1 . . [2,] NA 2 . 2 NA NA NA [3,] 3 NA 1 NA . . 1 [4,] . NA . 2 . 1 .
Convert NA to 0s and Sparsify Them
> sparsify(dt, sparsifyNAs=TRUE) 4 x 7 sparse Matrix of class "dgCMatrix" intCol realCol logCol ofCol ufCol_a ufCol_b ufCol_c [1,] 1 . 1 1 1 . . [2,] . 2 . 2 . . . [3,] 3 . 1 . . . 1 [4,] . . . 2 . 1 .
Generate columns defining NA values
> sparsify(dt[, list(realCol)], naCols="identify") 4 x 2 sparse Matrix of class "dgCMatrix" realCol_NA realCol [1,] 1 NA [2,] . 2 [3,] 1 NA [4,] 1 NA
Generate columns defining NA values ββin the most efficient memory operation
> sparsify(dt[, list(realCol)], naCols="efficient") 4 x 2 sparse Matrix of class "dgCMatrix" realCol_NotNA realCol [1,] . NA [2,] 1 2 [3,] . NA [4,] . NA