Suppose you have a data frame with a large number of columns (1000 factors, each of which has 15 levels). You want to create a data set with variable variables, but since it will be too meager, you would like to save the mannequins in a sparse matrix format.
My dataset is quite large, and the fewer steps, the better for me. I know how to do the above steps; but I could not come up with directly creating this sparse matrix from the original dataset, i.e. having one step instead of two. Any ideas?
EDIT: some comments require further development, so here it is:
Where X is my original dataset with 1000 columns and 50,000 records, each column has 15 levels,
Step 1: Create dummy variables from the source dataset using code:
# Creating dummy data set with empty values
dummified <- matrix(NA,nrow(X),15*ncol(X))
# Adding values to this data set for each column and each level within columns
for (i in 1:ncol(X)){colFactr <- factor(X[,i],exclude=NULL)
for (j in 1:l){
lvl <- levels(colFactr)[j]
indx <- ((i-1)*l)+j
dummified[,indx] <- ifelse(colFactr==lvl,1,0)
}
}
Step 2: transform this huge matrix into a sparse matrix with this code:
sparse.dummified <- sparseMatrix(dummified)
But this approach still created this intermediate large matrix, which takes a lot of time and memory, so I am setting a direct methodology (if any).