Which one is faster? Spark SQL with a Where clause or using a filter in a Dataframe after Spark SQL

Which one is faster? Spark SQL with a Where clause or using a filter in a Dataframe after Spark SQL?

Like Select col1, col2 from tab 1, where col1 = val;

or

dataframe df = sqlContext.sql (select col1, col2 from tab 1);

df.filter ("Stlb1 = Val");

+5
source share
1 answer

Using the explain method to view the physical plan is a good way to determine performance.

For example, the Zeppelin Tutorial tutorial.

 sqlContext.sql("select age, job from bank").filter("age = 30").explain 

AND

 sqlContext.sql("select age, job from bank where age = 30").explain 

Has exactly the same physical plane.

 == Physical Plan == Project [age#5,job#6] +- Filter (age#5 = 30) +- Scan ExistingRDD[age#5,job#6,marital#7,education#8,balance#9] 

Thus, performance should be the same.

In my opinion, select age, job from bank where age = 30 more readable in this case.

+13
source

Source: https://habr.com/ru/post/1259407/


All Articles