I wrote a DataFrame as a parquet file. And I would like to read the file using Hive, using metadata from the parquet.
Writing parquet output
_common_metadata part-r-00000-0def6ca1-0f54-4c53-b402-662944aa0be9.gz.parquet part-r-00002-0def6ca1-0f54-4c53-b402-662944aa0be9.gz.parquet _SUCCESS _metadata part-r-00001-0def6ca1-0f54-4c53-b402-662944aa0be9.gz.parquet part-r-00003-0def6ca1-0f54-4c53-b402-662944aa0be9.gz.parquet
Hive table
CREATE TABLE testhive ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/home/gz_files/result'; FAILED: SemanticException [Error 10043]: Either list of columns or a custom serializer should be specified
How can I output metadata from a parquet file?
If I open _common_metadata , I will be below the contents,
PAR1LHroot %TSN% %TS% %Etype% )org.apache.spark.sql.parquet.row.metadataโ{"type":"struct","fields":[{"name":"TSN","type":"string","nullable":true,"metadata":{}},{"name":"TS","type":"string","nullable":true,"metadata":{}},{"name":"Etype","type":"string","nullable":true,"metadata":{}}]}
Or how to parse a metadata file?
source share