Spark SQL and Apache Drill integration via JDBC

I would like to create a Spark SQL DataFrame from the results of a query executed on CSV data (on HDFS) using Apache Drill. I have successfully configured Spark SQL to connect to Drill via JDBC:

Map<String, String> connectionOptions = new HashMap<String, String>(); connectionOptions.put("url", args[0]); connectionOptions.put("dbtable", args[1]); connectionOptions.put("driver", "org.apache.drill.jdbc.Driver"); DataFrame logs = sqlc.read().format("jdbc").options(connectionOptions).load(); 

Spark SQL performs two queries: the first to get the schema, and the second to get the actual data:

 SELECT * FROM (SELECT * FROM dfs.output.`my_view`) WHERE 1=0 SELECT "field1","field2","field3" FROM (SELECT * FROM dfs.output.`my_view`) 

The first is successful, but in the second Spark includes fields in double quotes, which prevents Drill from not supporting, so the request fails.

Did anyone get this integration?

Thanks!

+5
source share
1 answer

you can add a JDBC dialect for this and register the dialect before using the jdbc connector

 case object DrillDialect extends JdbcDialect { def canHandle(url: String): Boolean = url.startsWith("jdbc:drill:") override def quoteIdentifier(colName: java.lang.String): java.lang.String = { return colName } def instance = this } JdbcDialects.registerDialect(DrillDialect) 
+1
source

Source: https://habr.com/ru/post/1243294/


All Articles