SparkSQL - direct reading of a parquet file

I am switching from Impala to SparkSQL using the following code to read a table:

my_data = sqlContext.read.parquet('hdfs://my_hdfs_path/my_db.db/my_table') 

How do I call SparkSQL above so that it can return something like:

 'select col_A, col_B from my_table' 
+9
scala hive hdfs apache-spark apache-spark-sql parquet
Dec 21 '16 at 2:03
source share
2 answers

After creating the Dataframe from the parquet file, you need to register it as a temporary table to run sql queries on it.

 val sqlContext = new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.parquet("src/main/resources/peopleTwo.parquet") df.printSchema // after registering as a table you will be able to run sql queries df.registerTempTable("people") sqlContext.sql("select * from people").collect.foreach(println) 
+17
Dec 21 '16 at 2:14
source share

Using simple SQL

JSON, ORC, Parquet, and CSV files can be requested without creating a table in the Spark DataFrame .

 //This Spark 2.x code you can do the same on sqlContext as well val spark: SparkSession = SparkSession.builder.master("set_the_master").getOrCreate spark.sql("select col_A, col_B from parquet.'hdfs://my_hdfs_path/my_db.db/my_table'") .show() 
+10
Mar 09 '17 at 6:30
source share



All Articles