How to convert JSON file to parquet using Apache Spark?

I am new to Apache Spark 1.3.1. How to convert the JSON file to Parquet?

+4
source share
1 answer

Spark 1.4 and later

You can use sparkSQL to first read the JSON file in a DataFrame and then write the DataFrame as a parquet file.

val df = sqlContext.read.json("path/to/json/file")
df.write.parquet("path/to/parquet/file")

or

df.save("path/to/parquet/file", "parquet")

Check here and here for examples and more details.

Spark 1.3.1

val df = sqlContext.jsonFile("path/to/json/file")
df.saveAsParquetFile("path/to/parquet/file")

Windows and Spark 1.3.1 related issue

Saving a DataFrame as a parquet file on Windows will result in a throw java.lang.NullPointerException, as described here .

Spark.

+10

Source: https://habr.com/ru/post/1623904/


All Articles