Problem with VectorUDT when using Spark ML

I am writing UDAF to apply to a column of a Spark data frame of type Vector (spark.ml.linalg.Vector). I rely on the spark.ml.linalg package, so I don't have to go back and forth between dataframe and RDD.

Inside UDAF, I have to specify the data type for input, buffer, and output schemes:

def inputSchema = new StructType().add("features", new VectorUDT())
def bufferSchema: StructType =
    StructType(StructField("list_of_similarities", ArrayType(new VectorUDT(), true), true) :: Nil)

override def dataType: DataType = ArrayType(DoubleType,true) 

VectorUDT is what I will use with spark.mllib.linalg.Vector: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/ linalg / Vectors.scala

However, when I try to import it from spark.ml instead: import org.apache.spark.ml.linalg.VectorUDT I get a runtime error (no errors during build):

class VectorUDT in package linalg cannot be accessed in package org.apache.spark.ml.linalg 

Is / expected / can you suggest a workaround?

I am using Spark 2.0.0

+4
1

Spark 2.0.0 org.apache.spark.ml.linalg.SQLDataTypes.VectorType VectorUDT. .

+16

Source: https://habr.com/ru/post/1651442/


All Articles