Spark DataFrame - read line delimited file using SQL?

Question

Spark DataFrame - read line delimited file using SQL?

Based on a CSV Source File as a DataFrame?

Is it possible to specify parameters using SQL to set the delimiter, null character and quote?

val df = spark.sql("SELECT * FROM csv.`csv/file/path/in/hdfs`")

I know that this can be done with help spark.read.format("csv").option("delimiter", "|"), but ideally I would not have to.

Updated Information

It seems that I need to go the way using reverse ticks.

When i try to convey OPTIONS

== SQL ==
SELECT * FROM 
csv.`csv/file/path/in/hdfs` OPTIONS (delimiter , "|" )
-----------------------------------^^^

Error in query:
mismatched input '(' expecting {<EOF>, ',', 'WHERE', 'GROUP', 'ORDER', 
'HAVING', 'LIMIT', 'JOIN', 'CROSS', 'INNER', 'LEFT', 'RIGHT', 'FULL', 
'NATURAL', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 
'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'ANTI'}

+4

scala apache-spark apache-spark-sql spark-dataframe

user2392304 Dec 02 '17 at 3:02

source share

1 answer

vatsal mevada · Answer 1 · 2017-12-03T18:37:22+0000

Altoguh is not one soul, the following may work for you:

spark.sql("CREATE TABLE some_table USING com.databricks.spark.csv OPTIONS (path \"csv/file/path/in/hdfs\", delimeter \"|\")");
val df = spark.sql("SELECT * FROM some_table");

Of course, you can skip the second step of loading in the dataframe if you want to perform some SQL operation directly on some_table.

Spark DataFrame - read line delimited file using SQL?

More articles: