Spark Dataset Typed Column Dataset

Looking at a function select()on a spark DataSet, there are various generated function signatures:

(c1: TypedColumn[MyClass, U1],c2: TypedColumn[MyClass, U2] ....)

It seems like I should be able to refer to MyClass members directly and be type safe, but I'm not sure how ...

ds.select("member"), of course, it works .. it looks like it ds.select(_.member)can also work somehow?

+5
source share
2 answers

In Scala DSL, selectthere are many ways to identify Column:

  • From the symbol: 'name
  • From the line: $"name"orcol(name)
  • From the expression: expr("nvl(name, 'unknown') as renamed")

To get TypedColumnout Column, you just use myCol.as[T].

For instance: ds.select(col("name").as[String])

+14

ds.select(_.member) map:

case class MyClass(member: MyMember, foo: A, bar: B)
val ds: DataSet[MyClass] = ???
val members: DataSet[MyMember] = ds.map(_.member)

: map.

, map . , Catalyst - . @Sim , , MyClass Tungsten JVM - - _.member .

, :

  // Make sure these are not nested classes 
  // (i.e. in a top level compilation units).
  case class MyMember(something: Double)
  case class MyClass(member: MyMember, foo: Int, bar: String)

case , SQLImplicits.newProductEncoder[T <: Product] Encoder[MyClass], API Dataset[T].

:

  val ds: Dataset[MyClass] = Seq(MyClass(MyMember(1.0), 2, "three")).toDS()
  val membersMapped: Dataset[Double] = ds.map(_.member.something)

, , explain():

membersMapped.explain()

== Physical Plan ==
*(1) SerializeFromObject [input[0, double, false] AS value#19]
+- *(1) MapElements <function1>, obj#18: double
   +- *(1) DeserializeToObject newInstance(class MyClass), obj#17: MyClass
      +- LocalTableScan [member#12, foo#13, bar#14]

/ Tungsten .

, [^ 1]:

val ds2: Dataset[Double] = ds.select($"member.something".as[Double])
ds2.explain()

== Physical Plan ==
LocalTableScan [something#25]

! [^ 2]. , MyClass .

[^ 1]: , $"member.something" $"value.member.something", , Catalyst DataFrame.

[^ 2]: , * , WholeStageCodegenExec , JVM, . , , .

+5

Source: https://habr.com/ru/post/1649443/


All Articles