Convert a factor column to multiple Boolean columns

Data that looks like this:

library(data.table) DT <- data.table(x=rep(1:5, 2)) 

I would like to break this data into 5 boolean columns that indicate the presence of each number.

I can do it like this:

 new.names <- sort(unique(DT$x)) DT[, paste0('col', new.names) := lapply(new.names, function(i) DT$x==i), with=FALSE] 

But this uses pesky lapply , which is probably slower than the alternative to data.table, and this solution seems to me not very "data.table-ish".

Is there a better and / or faster way to create these new columns?

+6
source share
3 answers

What about model.matrix ?

 model.matrix(~factor(x)-1,data=DT) factor(x)1 factor(x)2 factor(x)3 factor(x)4 factor(x)5 1 1 0 0 0 0 2 0 1 0 0 0 3 0 0 1 0 0 4 0 0 0 1 0 5 0 0 0 0 1 6 1 0 0 0 0 7 0 1 0 0 0 8 0 0 1 0 0 9 0 0 0 1 0 10 0 0 0 0 1 attr(,"assign") [1] 1 1 1 1 1 attr(,"contrasts") attr(,"contrasts")$`factor(x)` [1] "contr.treatment" 

Apparently, you can put model.matrix in [.data.table to give the same results. Not sure if this will be faster:

 DT[,model.matrix(~factor(x)-1)] 
+8
source

There is also nnet::class.ind

 library(nnet) cbind(DT, setnames(as.data.table(DT[, class.ind(x)]),paste0('col', unique(DT$x)))) 
+2
source
 library(data.table) DT <- data.table(x=rep(1:5, 2)) # add column with id DT[, id := seq.int(nrow(DT))] # cast long table into wide DT.wide <- dcast(DT, id ~ x, value.var = "x", fill = 0, fun = function(x) 1) 
0
source

Source: https://habr.com/ru/post/919798/


All Articles