Spark SQL Parquet File

I am trying to use Spark SQL using parquet file formats. When I try the main example:

object parquet {

  case class Person(name: String, age: Int)

  def main(args: Array[String]) {

    val sparkConf = new SparkConf().setMaster("local").setAppName("HdfsWordCount")
    val sc = new SparkContext(sparkConf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    // createSchemaRDD is used to implicitly convert an RDD to a SchemaRDD.
    import sqlContext.createSchemaRDD

    val people = sc.textFile("C:/Users/pravesh.jain/Desktop/people/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt))
    people.saveAsParquetFile("C:/Users/pravesh.jain/Desktop/people/people.parquet")

    val parquetFile = sqlContext.parquetFile("C:/Users/pravesh.jain/Desktop/people/people.parquet")
  }
}

I get a null pointer exception:

Exception in thread "main" java.lang.NullPointerException at org.apache.spark.parquet $ .main (parquet .scala: 16)

which is the string saveAparquetFile. What is the problem?

+2
source share
2 answers

This error occurs when I used Spark in eclipse on Windows. I tried the same thing on a spark-shell and it works great. I think the spark cannot be 100% compatible with windows.

+1
source

Spark Windows. Windows spark-submit , "-master" ( , Windows ). Spark Java- Eclispe Spark . Windows.

+1

Source: https://habr.com/ru/post/1623909/


All Articles