SparkSession / sparkContext cannot get hadoop configuration

I am running spark 2, bush, chaos on the local computer, and I want to use the sql spark to read data from the hive table.

Everything works fine when my hasoop works by default hdfs://localhost:9000, but if I switch to another port in the core-site.xml file:

<name>fs.defaultFS</name>
<value>hdfs://localhost:9099</value>

Running simple sql spark.sql("select * from archive.tcsv3 limit 100").show();in a spark shell will give me an error:

ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
.....
From local/147.214.109.160 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused;
.....

I get an AlreadyExistsException earlier, which does not seem to affect the result.

I can make it work by creating a new sparkContext:

import org.apache.spark.SparkContext
import org.apache.spark.sql.SparkSession
sc.stop()
var sc = new SparkContext()
val session = SparkSession.builder().master("local").appName("test").enableHiveSupport().getOrCreate()
session.sql("show tables").show()

My question is why the original sparkSession / sparkContext did not get the correct configuration? How can i fix this? Thank!

+4
source share
1

SparkSession , session.sparkContext

val session = SparkSession
  .builder()
  .appName("test")
  .enableHiveSupport()
  .getOrCreate()
import session.implicits._

session.sparkContext.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")

SparkContext SparkSession

+2

Source: https://habr.com/ru/post/1651777/


All Articles