I am trying to dynamically (without listing the column names and types in Hive DDL) create an external Hive table in the parquet data files. I have an Avro schematic of a parquet base file.
My attempt to use below DDL:
CREATE EXTERNAL TABLE parquet_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath' TBLPROPERTIES ('avro.schema.url'='http://myHost/myAvroSchema.avsc');
My Hive table was successfully created with the correct schema, but when I try to read the data:
SELECT * FROM parquet_test;
I get the following error:
java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable
Is there a way to successfully create and read Parquet files without mentioning the name and column types in DDL?
source share