The error occurred because the textFile
method from SparkContext
returned an RDD
, and I needed a DataFrame
.
SparkSession has an SQLContext
under the hood. Therefore, I needed to use the DataFrameReader
to read the CSV file correctly before converting it to a parquet file.
spark = SparkSession \ .builder \ .appName("Protob Conversion to Parquet") \ .config("spark.some.config.option", "some-value") \ .getOrCreate()
source share