Why does selectExpr change the schema (to include the id column)?

UPDATE (which makes false false and invalid)

Restored 2.2.0-SNAPSHOT with the latest changes from the master without my local changes on def schemain Dataset. IT WORKS. Sorry for the noise :(

$ ./bin/spark-shell --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0-SNAPSHOT
      /_/

Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
Branch master
Compiled by user jacek on 2017-03-27T19:00:06Z
Revision 3fada2f502107bd5572fb895471943de7b2c38e4
Url https://github.com/apache/spark.git
Type --help for more information.

scala> spark.range(1).printSchema
root
 |-- id: long (nullable = false)


scala> spark.range(1).selectExpr("*").printSchema
root
 |-- id: long (nullable = false)

While playing with selectExpr(in 2.2.0-SNAPSHOT from today's wizard), I noticed that the circuit changes color to a column id. I can not explain it. Is anyone

I can play it every time I start spark-shellby following these steps:

scala> spark.version
res0: String = 2.2.0-SNAPSHOT

scala> spark.range(1).printSchema
root
 |-- value: long (nullable = true)

scala> spark.range(1).explain(true)
== Parsed Logical Plan ==
Range (0, 1, step=1, splits=Some(8))

== Analyzed Logical Plan ==
id: bigint
Range (0, 1, step=1, splits=Some(8))

== Optimized Logical Plan ==
Range (0, 1, step=1, splits=Some(8))

== Physical Plan ==
*Range (0, 1, step=1, splits=Some(8))

scala> spark.range(1).printSchema
root
 |-- value: long (nullable = true)

scala> spark.range(1).selectExpr("*").printSchema
root
 |-- id: long (nullable = false)

scala> val rangeDS = spark.range(1)
rangeDS: org.apache.spark.sql.Dataset[Long] = [value: bigint]

scala> rangeDS.selectExpr("*").printSchema
root
 |-- id: long (nullable = false)

ps It looks like I can not play it in 2.1.0 .


$ ./bin/spark-shell --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0-SNAPSHOT
      /_/

Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_121
Branch master
Compiled by user jacek on 2017-03-27T03:43:09Z
Revision 3fbf0a5f9297f438bc92db11f106d4a0ae568613
Url https://github.com/apache/spark.git
Type --help for more information.
+4
1

, , "", selectExpr, , :

def selectExpr(exprs: String*): DataFrame = {
    select(exprs.map { expr =>
      Column(sparkSession.sessionState.sqlParser.parseExpression(expr))
    }: _*)
}

, :

def select(col: String, cols: String*): DataFrame = select((col +: cols).map(Column(_)) : _*)

, , SQL, , ,

2.2.0, :

res7: String = 2.2.0
root
 |-- id: long (nullable = false)
root
 |-- id: long (nullable = false)
0

Source: https://habr.com/ru/post/1673283/


All Articles