Spark newb question: I am doing exactly the same Spark SQL query in spark-sqland spark-shell. The version spark-shelltakes about 10 seconds, and the version spark-sqltakes about 20.
Sparks-sql REPL directly requests the request:
spark-sql> SELECT .... FROM .... LIMIT 20
Spark-wrapped REPL commands are as follows:
scala> val df = sqlContext.sql("SELECT ... FROM ... LIMIT 20 ")
scala> df.show()
In both cases, this is the exact same request. In addition, the query returns only a few rows because of the explicit LIMIT 20.
What is the difference from how the same request is executed from different CLIs?
I am working on a Sandware VM virtual machine (Linux CentOS) if that helps.
source
share