Dynamically bind a variable / parameter in Spark SQL?

How to bind a variable in Apache Spark SQL? For instance:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("SELECT * FROM src WHERE col1 = ${VAL1}").collect().foreach(println) 
+13
source share
5 answers

Spark SQL (since version 1.6) does not support bind variables.

ps. What Ashrit offers is not a binding variable. You create a line every time. Each time, Spark will analyze the request, create an execution plan, etc. The purpose of binding variables (for example, in RDBMS systems) is to reduce the time it takes to create an execution plan (which can be expensive with a large number of connections, etc.). Spark should have a special API for parsing the request, and then for binding the variables. Spark does not have this functionality (today is the release of Spark 1.6).

Update 8/2018 : since Spark 2.3 in Spark (so far) there are no binding variables.

+14
source

I checked this both in Spark shell 2.x and in Thrift (beeline) . I could bind a variable in a Spark SQL query using the set command.

Request without variable binding:

 select count(1) from mytable; 

A request with a bind variable (parameterized):

1. Spark SQL shell

  set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl} select count(1) from ${key_tbl}; 

2. Spark shell

 spark.sql("set key_tbl=mytable") spark.sql("select count(1) from ${key_tbl}").collect() 

Both w / wo bind parameters, the query returns an identical result.

Note. Do not use quotation marks for the key value as the table name here.

Let me know if you have any questions.

+10
source

You are looking at passing a variable from c in the same program / shell, if so:

 val VAL1 = "testcol" val sqlContext = new org.apache.spark.sql.SQLContext(sc) sqlContext.sql(s"SELECT * FROM src WHERE col1 = $VAL1").collect().foreach(println) 
+1
source

Pyspark

 sqlContext.sql("SELECT * FROM src WHERE col1 = {1} and col2 = {2}".format(VAL1,VAL2).collect().foreach(println) 
0
source

Try these

 sqlContext.sql(s"SELECT * FROM src WHERE col1 = '${VAL1}'").collect().foreach(println) 
0
source

Source: https://habr.com/ru/post/977762/


All Articles