Dynamically create an external Hive table using an Avro scheme on parquet data

I am trying to dynamically (without listing the column names and types in Hive DDL) create an external Hive table in the parquet data files. I have an Avro schematic of a parquet base file.

My attempt to use below DDL:

CREATE EXTERNAL TABLE parquet_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath' TBLPROPERTIES ('avro.schema.url'='http://myHost/myAvroSchema.avsc'); 

My Hive table was successfully created with the correct schema, but when I try to read the data:

 SELECT * FROM parquet_test; 

I get the following error:

 java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable 

Is there a way to successfully create and read Parquet files without mentioning the name and column types in DDL?

+1
source share
1 answer

The following query is executed:

 CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath'; 
+5
source

Source: https://habr.com/ru/post/1238337/


All Articles