Convert org.apache.spark.mllib.linalg.Vector RDD to DataFrame in Spark using Scala

I have org.apache.spark.mllib.linalg.Vector RDD that is [Int Int Int]. I am trying to convert this to a dataframe using this code

import sqlContext.implicits._
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.ArrayData

vectrdd is of type org.apache.spark.mllib.linalg.Vector

val vectarr = vectrdd.toArray()
case class RFM(Recency: Integer, Frequency: Integer, Monetary: Integer)
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()

I get the following error

warning: fruitless type test: a value of type         
org.apache.spark.mllib.linalg.Vector cannot also be a Array[T]
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()

error: pattern type is incompatible with expected type;
found   : Array[T]
required: org.apache.spark.mllib.linalg.Vector
val df = vectarr.map { case Array(p0, p1, p2) => RFM(p0, p1, p2) }.toDF()

The second method I tried is

val vectarr=vectrdd.toArray().take(2)
case class RFM(Recency: String, Frequency: String, Monetary: String)
val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF()

I got this error

error: constructor cannot be instantiated to expected type;
found   : (T1, T2, T3)
required: org.apache.spark.mllib.linalg.Vector
val df = vectrdd.map { case (t0, t1, t2) => RFM(p0, p1, p2) }.toDF()

I used this example as a guide -> Convert RDD to DataFrame in Spark / Scala

+4
source share
1 answer

vectarr Array[org.apache.spark.mllib.linalg.Vector], Array(p0, p1, p2), , , , .

, val vectarr = vectrdd.toArray() - RDD Array, toDF , toDF RDD.

( RFM )

val df = vectrdd.map(_.toArray).map { case Array(p0, p1, p2) => RFM(p0, p1, p2)}.toDF()

, , val vectarr = vectrdd.toArray() ( Array[Vector]) val arrayRDD = vectrdd.map(_.toArray()) ( RDD[Array[Double]])

+3

Source: https://habr.com/ru/post/1623495/


All Articles