Model.predictProbabilities () for LogisticRegression in Spark?

Question

Model.predictProbabilities () for LogisticRegression in Spark?

I am launching multi-level Logistic Regression (withLBFGS) with Spark 1.6.

x is given and possible labels {1.0.2.0.3.0} the final model will only output what is the best forecast, say 2.0 .

If I’m interested in knowing what was the second best forecast, tell me 3.0 , how can I get this information?

In NaiveBayes, I would use the model.predictProbabilities () function, which for each sample would produce a vector with all the probabilities for each possible result.

+4

apache-spark apache-spark-mllib logistic-regression

bobo32 Feb 08 '16 at 16:30

source share

2 answers

Daniel Darabos · Answer 1 · 2016-02-08T17:27:27+0000

There are two ways to make logical regression in Spark: spark.mland spark.mllib.

With DataFrames you can use spark.ml:

import org.apache.spark
import sqlContext.implicits._

def p(label: Double, a: Double, b: Double) =
  new spark.mllib.regression.LabeledPoint(
    label, new spark.mllib.linalg.DenseVector(Array(a, b)))

val data = sc.parallelize(Seq(p(1.0, 0.0, 0.5), p(0.0, 0.5, 1.0)))
val df = data.toDF

val model = new spark.ml.classification.LogisticRegression().fit(df)
model.transform(df).show

You get initial forecasts and probabilities:

+-----+---------+--------------------+--------------------+----------+
|label| features|       rawPrediction|         probability|prediction|
+-----+---------+--------------------+--------------------+----------+
|  1.0|[0.0,0.5]|[-19.037302860930...|[5.39764620520461...|       1.0|
|  0.0|[0.5,1.0]|[18.9861466274786...|[0.99999999431904...|       0.0|
+-----+---------+--------------------+--------------------+----------+

With RDD, you can use spark.mllib:

val model = new spark.mllib.classification.LogisticRegressionWithLBFGS().run(data)

This model does not reveal raw predictions and probabilities. You can take a look predictPoint. He multiplies vectors and selects the class with the highest prediction. Scales are publicly available, so you can copy this algorithm and save the predictions instead of just returning the highest.

bobo32 · Answer 2 · 2016-02-11T23:45:05+0000

Following @Daniel Darabos tips:

LogisticRegression ml mllib , , .
PredictedPoint , . :

    def predictPointForMulticlass(featurizedVector:Vector,weightsArray:Vector,intercept:Double,numClasses:Int,numFeatures:Int) : Seq[(String, Double)] = {

        val weightsArraySize = weightsArray.size
        val dataWithBiasSize = weightsArraySize / (numClasses - 1)
        val withBias = false

        var bestClass = 0
        var maxMargin = 0.0
        var margins = new Array[Double](numClasses - 1)
        var temp_marginMap = new HashMap[Int, Double]()
        var res = new HashMap[Int, Double]()

        (0 until numClasses - 1).foreach { i =>
          var margin = 0.0
          var index = 0
          featurizedVector.toArray.foreach(value => {
            if (value != 0.0) {
              margin += value * weightsArray((i * dataWithBiasSize) + index)
            }
            index += 1
          }
          )
        // Intercept is required to be added into margin.
        if (withBias) {
            margin += weightsArray((i * dataWithBiasSize) + featurizedVector.size)
        }
        val prob = 1.0 / (1.0 + Math.exp(-margin))
        margins(i) = margin

        temp_marginMap += (i -> margin)

        if(margin > maxMargin) {
            maxMargin = margin
            bestClass = i + 1
          }
        }

        for ((k,v) <- temp_marginMap){
          val calc =probCalc(maxMargin,v)
          res += (k -> calc)
        }

      return res
    }

probCalc() :

      def probCalc(maxMargin:Double,margin:Double) :Double ={
        val res = 1.0 / (1.0 + Math.exp(-(margin - maxMargin)))
        res
      }

Hashmap [Int, Double], .

, !

Model.predictProbabilities () for LogisticRegression in Spark?

More articles: