How to calculate inverse of RowMatrix in Apache Spark?

I have a distributed matrix X in the form of a RowMatrix. I am using Spark 1.3.0. I need to calculate X the inverse.

+6
source share
3 answers
import org.apache.spark.mllib.linalg.{Vectors,Vector,Matrix,SingularValueDecomposition,DenseMatrix,DenseVector} import org.apache.spark.mllib.linalg.distributed.RowMatrix def computeInverse(X: RowMatrix): DenseMatrix = { val nCoef = X.numCols.toInt val svd = X.computeSVD(nCoef, computeU = true) if (svd.s.size < nCoef) { sys.error(s"RowMatrix.computeInverse called on singular matrix.") } // Create the inv diagonal matrix from S val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x,-1)))) // U cannot be a RowMatrix val U = new DenseMatrix(svd.U.numRows().toInt,svd.U.numCols().toInt,svd.U.rows.collect.flatMap(x => x.toArray)) // If you could make V distributed, then this may be better. However its alreadly local...so maybe this is fine. val V = svd.V // inv(X) = V*inv(S)*transpose(U) --- the U is already transposed. (V.multiply(invS)).multiply(U) } 
+6
source

I'm having trouble using this feature with the option

 conf.set("spark.sql.shuffle.partitions", "12") 

Lines in RowMatrix are mixed up.

Here is the update that worked for me

 import org.apache.spark.mllib.linalg.{DenseMatrix,DenseVector} import org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix def computeInverse(X: IndexedRowMatrix) : DenseMatrix = { val nCoef = X.numCols.toInt val svd = X.computeSVD(nCoef, computeU = true) if (svd.s.size < nCoef) { sys.error(s"IndexedRowMatrix.computeInverse called on singular matrix.") } // Create the inv diagonal matrix from S val invS = DenseMatrix.diag(new DenseVector(svd.s.toArray.map(x => math.pow(x, -1)))) // U cannot be a RowMatrix val U = svd.U.toBlockMatrix().toLocalMatrix().multiply(DenseMatrix.eye(svd.U.numRows().toInt)).transpose val V = svd.V (V.multiply(invS)).multiply(U) } 
+3
source

The matrix U returned by X.computeSVD has dimensions mxk, where m is the number of rows of the original (distributed) RowMatrix X. One would expect m to be large (possibly larger than k), so it is not practical to collect this in the driver if we want our code to scale to really large m values.

I would say that both of these solutions below suffer from this drawback. The answer @ Alexander Kharlamov calls val U = svd.U.toBlockMatrix().toLocalMatrix() , which collects the matrix in the driver. The same thing happens with @ Climbs_lika_Spyder (By the way, your nickname is stones !!), which calls svd.U.rows.collect.flatMap(x => x.toArray) . I would rather rely on distributed matrix multiplication, for example, on the Scala code posted here .

0
source

Source: https://habr.com/ru/post/986301/


All Articles