Spark SQL performance difference in spark-sql versus REPL spark-shell

Question

Spark SQL performance difference in spark-sql versus REPL spark-shell

Spark newb question: I am doing exactly the same Spark SQL query in spark-sqland spark-shell. The version spark-shelltakes about 10 seconds, and the version spark-sqltakes about 20.

Sparks-sql REPL directly requests the request:

spark-sql> SELECT .... FROM .... LIMIT 20

Spark-wrapped REPL commands are as follows:

scala> val df = sqlContext.sql("SELECT ... FROM ... LIMIT 20 ") 
scala> df.show()

In both cases, this is the exact same request. In addition, the query returns only a few rows because of the explicit LIMIT 20.

What is the difference from how the same request is executed from different CLIs?

I am working on a Sandware VM virtual machine (Linux CentOS) if that helps.

+4

apache-spark

wrschneider Feb 11 '16 at 2:43

source share

1 answer

Moustafa Mahmoud · Answer 1 · 2019-05-26T15:20:39+0000

I think this is more about two parts,

-, . spark-sql, . . , , shell sql,
-, spark-sql . . Spark-shell , spark-sql. top , spark-shell , spark-sql.

Spark SQL performance difference in spark-sql versus REPL spark-shell

More articles: