How to get column names and their parquet data types using pyspark?

Question

How to get column names and their parquet data types using pyspark?

I have a parquet file on my hadoop cluster, I want to capture the column names and their data types and write them to a text file. How to get column names and their parquet data types using pyspark.

+4

apache-spark pyspark

Shubham mishra Jan 9 '16 at 15:49

source share

2 answers

zero323 · Answer 1 · 2016-01-09T16:39:36+0000

You can simply read the file and use it schemato access individual fields:

sqlContext.read.parquet(path_to_parquet_file).schema.fields

tranquilram · Answer 2 · 2016-07-12T14:46:41+0000

Use dataframe.printSchema () - Prints a diagram in tree format.

df.printSchema () root | - Humor | - Name: string (nullable = true)

You can redirect the output of your program and fix it in a text file.

How to get column names and their parquet data types using pyspark?

More articles: