Model.predictProbabilities () for LogisticRegression in Spark?

I am launching multi-level Logistic Regression (withLBFGS) with Spark 1.6.

x is given and possible labels {1.0.2.0.3.0} the final model will only output what is the best forecast, say 2.0 .

If I’m interested in knowing what was the second best forecast, tell me 3.0 , how can I get this information?

In NaiveBayes, I would use the model.predictProbabilities () function, which for each sample would produce a vector with all the probabilities for each possible result.

+4
source share
2 answers

There are two ways to make logical regression in Spark: spark.mland spark.mllib.

With DataFrames you can use spark.ml:

import org.apache.spark
import sqlContext.implicits._

def p(label: Double, a: Double, b: Double) =
  new spark.mllib.regression.LabeledPoint(
    label, new spark.mllib.linalg.DenseVector(Array(a, b)))

val data = sc.parallelize(Seq(p(1.0, 0.0, 0.5), p(0.0, 0.5, 1.0)))
val df = data.toDF

val model = new spark.ml.classification.LogisticRegression().fit(df)
model.transform(df).show

You get initial forecasts and probabilities:

+-----+---------+--------------------+--------------------+----------+
|label| features|       rawPrediction|         probability|prediction|
+-----+---------+--------------------+--------------------+----------+
|  1.0|[0.0,0.5]|[-19.037302860930...|[5.39764620520461...|       1.0|
|  0.0|[0.5,1.0]|[18.9861466274786...|[0.99999999431904...|       0.0|
+-----+---------+--------------------+--------------------+----------+

With RDD, you can use spark.mllib:

val model = new spark.mllib.classification.LogisticRegressionWithLBFGS().run(data)

This model does not reveal raw predictions and probabilities. You can take a look predictPoint. He multiplies vectors and selects the class with the highest prediction. Scales are publicly available, so you can copy this algorithm and save the predictions instead of just returning the highest.

+3
source

Following @Daniel Darabos tips:

    def predictPointForMulticlass(featurizedVector:Vector,weightsArray:Vector,intercept:Double,numClasses:Int,numFeatures:Int) : Seq[(String, Double)] = {

        val weightsArraySize = weightsArray.size
        val dataWithBiasSize = weightsArraySize / (numClasses - 1)
        val withBias = false

        var bestClass = 0
        var maxMargin = 0.0
        var margins = new Array[Double](numClasses - 1)
        var temp_marginMap = new HashMap[Int, Double]()
        var res = new HashMap[Int, Double]()

        (0 until numClasses - 1).foreach { i =>
          var margin = 0.0
          var index = 0
          featurizedVector.toArray.foreach(value => {
            if (value != 0.0) {
              margin += value * weightsArray((i * dataWithBiasSize) + index)
            }
            index += 1
          }
          )
        // Intercept is required to be added into margin.
        if (withBias) {
            margin += weightsArray((i * dataWithBiasSize) + featurizedVector.size)
        }
        val prob = 1.0 / (1.0 + Math.exp(-margin))
        margins(i) = margin

        temp_marginMap += (i -> margin)

        if(margin > maxMargin) {
            maxMargin = margin
            bestClass = i + 1
          }
        }

        for ((k,v) <- temp_marginMap){
          val calc =probCalc(maxMargin,v)
          res += (k -> calc)
        }

      return res
    }

probCalc() :       

      def probCalc(maxMargin:Double,margin:Double) :Double ={
        val res = 1.0 / (1.0 + Math.exp(-(margin - maxMargin)))
        res
      }

Hashmap [Int, Double], .

, !

+2

Source: https://habr.com/ru/post/1627822/


All Articles