Spark SQL performance difference in spark-sql versus REPL spark-shell

Spark newb question: I am doing exactly the same Spark SQL query in spark-sqland spark-shell. The version spark-shelltakes about 10 seconds, and the version spark-sqltakes about 20.

Sparks-sql REPL directly requests the request:

spark-sql> SELECT .... FROM .... LIMIT 20

Spark-wrapped REPL commands are as follows:

scala> val df = sqlContext.sql("SELECT ... FROM ... LIMIT 20 ") 
scala> df.show()

In both cases, this is the exact same request. In addition, the query returns only a few rows because of the explicit LIMIT 20.

What is the difference from how the same request is executed from different CLIs?

I am working on a Sandware VM virtual machine (Linux CentOS) if that helps.

+4
source share
1 answer

I think this is more about two parts,

  1. -, . spark-sql, . . , , shell sql,

  2. -, spark-sql . . Spark-shell , spark-sql. top , spark-shell , spark-sql.

0

Source: https://habr.com/ru/post/1628325/


All Articles