I have a parquet file on my hadoop cluster, I want to capture the column names and their data types and write them to a text file. How to get column names and their parquet data types using pyspark.
You can simply read the file and use it schemato access individual fields:
schema
fields
sqlContext.read.parquet(path_to_parquet_file).schema.fields
Use dataframe.printSchema () - Prints a diagram in tree format.
df.printSchema () root | - Humor | - Name: string (nullable = true)
You can redirect the output of your program and fix it in a text file.
Source: https://habr.com/ru/post/1623568/More articles:New line in WhatsApp vs Mail - iosHtml ordered list ol, add space between number and text - htmlконвертировать изображение из декартовой в Полярный - imageRenaming a fat can with Maven - javaRegex for finding duplicate patterns in a phone number? Or maybe not? - javascriptStack Distribution for Functions - c ++How to synchronize Redis with local memory cache? - cachingOnload event for javascript sound object - javascriptНеверная ошибка пакета (возможно, CocoaPods вызвала ошибку) - iosWhat is the best way to delete multiple entries from a store? ExtJS - javascriptAll Articles