I have a problem with a simple spark task that reads an Avro file and then saves it as a Hive parquet table.
I have two types of files, in general they are the same, but the key structure is slightly different - field names.
Type 1
root
|-- pk: strucnt (nullable = true)
|-- term_id: string (nullale = true)
Type 2
root
|-- pk: strucnt (nullable = true)
|-- id: string (nullale = true)
I read Avro using spark-avro. And then map this DF to bean as follows
Dataset<SomeClass> df = avroDF.as(Encoders.bean(SomeClass.class));
SomeClass is a simple single-field class with a getter and installer.
public class SomeClass{
private String term_id;
...
}
So, if I read Avro type 1, that's fine. But if I read Avro type 2, an error occurs. And vice versa, if I change the field name toprivate String id;
Is there a universal solution to my problem? I found @AvroName, but it does not allow to set multiple names. Thank.
source
share