Problem with automatic serialization of Avro docs json packages

I am trying to use Apache Avro to implement a data scheme exported from Elastic Search to many Avro documents in HDFS (for querying using Drill). I'm Having Problems with Avro's Default Settings

Given this scheme:

{ "namespace" : "avrotest", "type" : "record", "name" : "people", "fields" : [ {"name" : "firstname", "type" : "string"}, {"name" : "age", "type" :"int", "default": -1} ] } 

I expect a json document such as {"firstname": "Jane"} will be serialized using the default value -1 for the age field.

default: the default value for this field, used when reading instances in which this field is absent (optional).

However this does not happen

 java -jar avro-tools-1.8.0.jar fromjson --schema-file p2.avsc jane.json > jane.avro Exception in thread "main" org.apache.avro.AvroTypeException: Expected int. Got END_OBJECT at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697) at org.apache.avro.io.JsonDecoder.readInt(JsonDecoder.java:172) at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83) at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:511) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:182) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230) at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:174) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144) at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:99) at org.apache.avro.tool.Main.run(Main.java:87) at org.apache.avro.tool.Main.main(Main.java:76) 

Is this possible, or am I missing something?

+6
source share
1 answer

The fact is that if you declare your field in the schema as follows:

 {"name": "fieldName", "type": ["int", "null"], default: null } 

It is not enough to use the field as optional; try declaring it as follows:

 {"name": "fieldName", "type": ["null", "int"], default: null } 
0
source

Source: https://habr.com/ru/post/1244372/


All Articles