Spark dataframe

Question

Spark dataframe

When I look at frame data on the spark shell (version 1.6), the column names are case insensitive. On Spark-Shell

 val a = sqlContext.read.parquet("<my-location>")
   a.filter($"name" <=> "andrew").count()
   a.filter($"NamE" <=> "andrew").count()

Both of the above results give me the correct score. But when I create this in the bank and run through the "spark-submit", the code below does not say that NamE does not exist, since the basic data of the parquet was saved with the column as "name"

Fails:

a.filter($"NamE" <=> "andrew").count()

Pass:

a.filter($"NamE" <=> "andrew").count()

Am I missing something here? Is there a way that I can do case insensitive. I know that I can use the select filter before filtering and make all columns lowercase aliases, but I wanted to know why it behaves differently.

+4

apache-spark spark-dataframe

ftw Dec 10 '16 at 20:46

source share

3 answers

, :

... ,

:

val b = df.toDF(df.columns.map(_.toLowerCase): _*)
b.filter(...)

+6

user6022341 10 . '16 20:54

Try explicitly setting case sensitivity with sqlContext. Turn off case sensitivity using the instructions below and see if it helps.

sqlContext.sql("set spark.sql.caseSensitive=false")

+1

neeraj bhadani May 16 '17 at 10:41

source share

Florent Moiny · Accepted Answer · 2016-12-11T14:41:47+0000

: , , SQLContext , . SQLContext, HiveContext:

scala> sqlContext.getClass res3: Class[_ <: org.apache.spark.sql.SQLContext] = class org.apache.spark.sql.hive.HiveContext

-submit, , , SQLContext. @LostInOverflow: Hive is case insensitive, while Parquet is not, : HiveContext , , , Hive, Parquet. , , . SQLContext , .

Spark dataframe

More articles: