Are MLlibs for Breeze vectors / matrices private to the org.apache.spark.mllib area?

Question

Are MLlibs for Breeze vectors / matrices private to the org.apache.spark.mllib area?

I read somewhere that local MLlib vectors / matrices are currently completing the Breeze implementation, but methods that convert MLlib to Breeze vectors / matrices are private to the org.apache.spark.mllib area. A suggestion about this is to write code in the org.apache.spark.mllib.something package.

Is there a better way to do this? Could you give some relevant examples?

Thank you and welcome

+7

apache-spark apache-spark-mllib scala-breeze

learning_spark Oct 30 '14 at 10:08

source share

6 answers

lev · Answer 1 · 2014-11-15T16:42:42+0000

I made the same decision as @dlwh. Here is the code that does this:

package org.apache.spark.mllib.linalg object VectorPub { implicit class VectorPublications(val vector : Vector) extends AnyVal { def toBreeze : breeze.linalg.Vector[scala.Double] = vector.toBreeze } implicit class BreezeVectorPublications(val breezeVector : breeze.linalg.Vector[Double]) extends AnyVal { def fromBreeze : Vector = Vectors.fromBreeze(breezeVector) } }

note that the implicit class extends AnyVal to prevent the allocation of a new object when calling these methods

dlwh · Answer 2 · 2014-10-31T17:06:06+0000

As I understand it, Spark people do not want to disclose third-party APIs (including Breeze) so that they are easier to change if they decide to move away from them.

You can always specify a simple implicit conversion class in this package and write the rest of the code in your own package. Not much better than just putting everything in there, but it makes a little more obvious why you are doing it.

javadba · Answer 3 · 2015-02-01T01:25:09+0000

Here is the best I have. Pay attention to @dlwh: please provide any improvements you may have.

The solution I could come up with - which does not put the code inside the mllib.linalg package - is to convert each vector to a new Breeze DenseVector.

 val v1 = Vectors.dense(1.0, 2.0, 3.0) val v2 = Vectors.dense(4.0, 5.0, 6.0) val bv1 = new DenseVector(v1.toArray) val bv2 = new DenseVector(v2.toArray) val vectout = Vectors.dense((bv1 + bv2).toArray) vectout: org.apache.spark.mllib.linalg.Vector = [5.0,7.0,9.0]

corvi42 · Answer 4 · 2019-01-20T22:32:52+0000

My solution is a kind of hybrid of those from @barclar and @lev, above. You do not need to put your code in org.apache.spark.mllib.linalg unless you use implicit spark-ml conversions. You can define your own implicit conversions in your own package, for example:

 package your.package import org.apache.spark.ml.linalg.DenseVector import org.apache.spark.ml.linalg.SparseVector import org.apache.spark.ml.linalg.Vector import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV} object BreezeConverters { implicit def toBreeze( dv: DenseVector ): BDV[Double] = new BDV[Double](dv.values) implicit def toBreeze( sv: SparseVector ): BSV[Double] = new BSV[Double](sv.indices, sv.values, sv.size) implicit def toBreeze( v: Vector ): BV[Double] = v match { case dv: DenseVector => toBreeze(dv) case sv: SparseVector => toBreeze(sv) } implicit def fromBreeze( dv: BDV[Double] ): DenseVector = new DenseVector(dv.toArray) implicit def fromBreeze( sv: BSV[Double] ): SparseVector = new SparseVector(sv.length, sv.index, sv.data) implicit def fromBreeze( bv: BV[Double] ): Vector = bv match { case dv: BDV[Double] => fromBreeze(dv) case sv: BSV[Double] => fromBreeze(sv) } }

You can then import these effects into your code with:

 import your.package.BreezeConverters._

barclar · Answer 5 · 2017-02-04T21:03:58+0000

This solution allows you to put code in Spark packages and avoid converting sparse into dense vectors:

 def toBreeze(vector: Vector) : breeze.linalg.Vector[scala.Double] = vector match { case sv: SparseVector => new breeze.linalg.SparseVector[Double](sv.indices, sv.values, sv.size) case dv: DenseVector => new breeze.linalg.DenseVector[Double](dv.values) }

Lamine lazreg · Answer 6 · 2018-05-12T21:34:13+0000

This is a method that I would use to convert Mlib DenceMatrix to a simple matrix, maybe this will help !!

 import breeze.linalg._ import org.apache.spark.mllib.linalg.Matrix def toBreez(X:org.apache.spark.mllib.linalg.Matrix):breeze.linalg.DenseMatrix[Double] = { var i=0; var j=0; val m = breeze.linalg.DenseMatrix.zeros[Double](X.numRows,X.numCols) for(i <- 0 to X.numRows-1){ for(j <- 0 to X.numCols-1){ m(i,j)=X.apply(i, j) } } m }

Are MLlibs for Breeze vectors / matrices private to the org.apache.spark.mllib area?

More articles: