I would like to create a Spark SQL DataFrame from the results of a query executed on CSV data (on HDFS) using Apache Drill. I have successfully configured Spark SQL to connect to Drill via JDBC:
Map<String, String> connectionOptions = new HashMap<String, String>(); connectionOptions.put("url", args[0]); connectionOptions.put("dbtable", args[1]); connectionOptions.put("driver", "org.apache.drill.jdbc.Driver"); DataFrame logs = sqlc.read().format("jdbc").options(connectionOptions).load();
Spark SQL performs two queries: the first to get the schema, and the second to get the actual data:
SELECT * FROM (SELECT * FROM dfs.output.`my_view`) WHERE 1=0 SELECT "field1","field2","field3" FROM (SELECT * FROM dfs.output.`my_view`)
The first is successful, but in the second Spark includes fields in double quotes, which prevents Drill from not supporting, so the request fails.
Did anyone get this integration?
Thanks!
source share