Dynamically create an external Hive table using an Avro scheme on parquet data

Question

Dynamically create an external Hive table using an Avro scheme on parquet data

I am trying to dynamically (without listing the column names and types in Hive DDL) create an external Hive table in the parquet data files. I have an Avro schematic of a parquet base file.

My attempt to use below DDL:

CREATE EXTERNAL TABLE parquet_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath' TBLPROPERTIES ('avro.schema.url'='http://myHost/myAvroSchema.avsc');

My Hive table was successfully created with the correct schema, but when I try to read the data:

 SELECT * FROM parquet_test;

I get the following error:

 java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Expecting a AvroGenericRecordWritable

Is there a way to successfully create and read Parquet files without mentioning the name and column types in DDL?

+1

hive avro parquet

tmouron Dec 9 '15 at 14:52

source share

1 answer

Ram manohar · Accepted Answer · 2015-12-10T17:13:35+0000

The following query is executed:

 CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';

Dynamically create an external Hive table using an Avro scheme on parquet data

More articles: