SparkSession initialization error - Cannot use spark.read

Question

SparkSession initialization error - Cannot use spark.read

I tried to create a separate PySpark program that reads CSV and saves it in the bush table. I have problems setting up Spark session objects, conference objects, and contexts. Here is my code:

from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext, SparkSession from pyspark.sql.types import * conf = SparkConf().setAppName("test_import") sc = SparkContext(conf=conf) sqlContext = SQLContext(sc) spark = SparkSession.builder.config(conf=conf) dfRaw = spark.read.csv("hdfs:/user/..../test.csv",header=False) dfRaw.createOrReplaceTempView('tempTable') sqlContext.sql("create table customer.temp as select * from tempTable")

And I get the error:

dfRaw = spark.read.csv ("hdfs: /user/../test.csv", header = False) AttributeError: the object 'Builder' does not have the attribute 'read'

How to properly configure a spark session object to use the read.csv command? Also, can someone explain the difference between the Session, Context, and Conference objects?

+9

python apache-spark pyspark apache-spark-sql apache-spark-2.0

Michail n Oct 24 '17 at 8:42

source share

1 answer

Shaido · Accepted Answer · 2017-10-24T08:55:31+0000

There is no need to use both SparkContext and SparkSession to initialize Spark. SparkSession is a new, recommended use.

To initialize your environment, simply do:

 spark = SparkSession\ .builder\ .appName("test_import")\ .getOrCreate()

You can run SQL commands by doing the following:

 spark.sql(...)

Prior to Spark 2.0.0, three separate objects were used: SparkContext , SQLContext and HiveContext . They were used separately depending on what you wanted to do and the data types used.

With the introduction of the Dataset / DataFrame abstractions, the SparkSession object SparkSession become the main entry point into the Spark environment. You can still access other objects by first initializing SparkSession (say, in a variable called spark ), and then make spark.sparkContext / spark.sqlContext .

SparkSession initialization error - Cannot use spark.read

More articles: