Convert RDD vector to LabeledPoint using Scala - MLLib in Apache Spark

I use MLlib Apache-Spark and Scala. I need to convert the Vector group

import org.apache.spark.mllib.linalg.{Vector, Vectors} import org.apache.spark.mllib.regression.LabeledPoint 

LabeledPoint for applying MLLib algorithms
Each vector consists of a double value of 0.0 (false) or 1.0 (true). All vectors are stored in the RDD, so the final RDD is of type

  val data_tmp: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] 

So, in RDD there are vectors created using

  def createArray(values: List[String]) : Vector = { var arr : Array[Double] = new Array[Double](tags_table.size) tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 ) val dv: Vector = Vectors.dense(arr) return dv } /*each element of result is a List[String]*/ val data_tmp=result.map(x=> createArray(x._2)) val data: RowMatrix = new RowMatrix(data_tmp) 

How can I create from this RDD (data_tmp) or from RowMatrix (data) a LabeledPoint set to use MLLib algorithms? For example, I need to apply linear SVMs algorithms here

+6
source share
1 answer

I found a solution:

  def createArray(values: List[String]) : Vector = { var arr : Array[Double] = new Array[Double](tags_table.size) tags_table.foreach(x => arr(x._2) = if (values.contains(x._1)) 1.0 else 0.0 ) val dv: Vector = Vectors.dense(arr) return dv } val data_tmp=result.map(x=> createArray(x._2)) val parsedData = data_tmp.map { line => LabeledPoint(1.0,line) } 
+1
source

Source: https://habr.com/ru/post/977944/


All Articles