Write sparse matrix in CSV in R

I have a sparse matrix ( dgCMatrix ) as a result of installing a glmnet . I want to write this result to .csv , but I can’t use the write.table() matrix, because it cannot be forcibly used in data.frame .

Is there a way to force a sparse matrix to either data.frame or a regular matrix? Or is there a way to write it to a file while preserving the names of the coefficients, which are probably the names of the strings?

+4
source share
4 answers

as.matrix() converted to a full dense representation:

 > as.matrix(Matrix(0, 3, 2)) [,1] [,2] [1,] 0 0 [2,] 0 0 [3,] 0 0 

You can write the resulting object using write.csv or write.table .

+4
source

It will be dangerous to convert a sparse matrix to a regular one if the size of the sparse matrix is ​​too large. In my case (the task of classifying text) I got a matrix of 22,490 by 120,000 in size. I think if you try to get a dense matrix, it will be more than 20 GB. Then R will break!

So, my suggestion, you can simply save the sparse matrix in an efficient and memorable way, for example, the Matrix Market Format , which stores all non-zero values ​​and their coordinates (row number and column number). In R, you can use the writeMM method

+7
source
 # input: a sparse matrix with named rows and columns (dimnames) # returns: a data frame representing triplets (r, c, x) suitable for writing to a CSV file sparse2triples <- function(m) { SM = summary(m) D1 = m@Dimnames [[1]][SM[,1]] D2 = m@Dimnames [[2]][SM[,2]] data.frame(row=D1, col=D2, x=m@x ) } 

Example

 > library(Matrix) > dn <- list(LETTERS[1:3], letters[1:5]) > m <- sparseMatrix(i = c(3,1,3,2,2,1), p= c(0:2, 4,4,6), x = 1:6, dimnames = dn) > m 3 x 5 sparse Matrix of class "dgCMatrix" abcde A . 2 . . 6 B . . 4 . 5 C 1 . 3 . . > sparse2triples(m) row col x 1 C a 1 2 A b 2 3 B c 4 4 C c 3 5 A e 6 6 B e 5 

[EDIT: use data.frame]

+3
source

Converting directly to a dense matrix is ​​likely to result in a lot of memory. The R-packet matrix allows you to convert a sparse matrix into a data frame of the triplet memory data format using the summary() function, which can then be easily written to csv. This is probably simpler and simpler than an approach to the matrix market. See Answer to this related question: Sparse matrix in data frame in R

In addition, here is an illustration from the Matrix documentation:

 ## very simple export - in triplet format - to text file: data(CAex) s.CA <- summary(CAex) s.CA # shows (i, j, x) [columns of a data frame] message("writing to ", outf <- tempfile()) write.table(s.CA, file = outf, row.names=FALSE) ## and read it back -- showing off sparseMatrix(): str(dd <- read.table(outf, header=TRUE)) ## has columns (i, j, x) -> we can use via do.call() as arguments to sparseMatrix(): mm <- do.call(sparseMatrix, dd) stopifnot(all.equal(mm, CAex, tolerance=1e-15)) 
+2
source

Source: https://habr.com/ru/post/1333742/


All Articles