Apache Avro: Map Uses CharSequence as Key

I am using Apache Avro .

My scheme has a map type:

{"name": "MyData", "type" : {"type": "map", "values":{ "type": "record", "name": "Person", "fields":[ {"name": "name", "type": "string"}, {"name": "age", "type": "int"}, ] } } } 

After compiling the genated schema, the Java class uses CharSequence as the key for Map MyData .

It is very difficult to use CharSequence in the Map as the key, there is a way to generate a key String for the Map in the Apache the Avro ?

PS

The problem is that, for example, dataMap.containsKey("SOME_KEY") will return false , although there is such a key there, just because it is CharSequence . Also, put the record in the card with the existing key, do not move the old one. That's why I say that it is inconvenient to use CharSequence as a key.

+6
source share
6 answers

There seems to be a workaround for this issue in Avro 1.6. You specify the type of line in the POM project file:

  <stringType>String</stringType> 

This release mentions the AVRO-803 ... although the documentation in the plugin does not reflect this.

+6
source

This JIRA discussion matters. The key to using CharSequence is backward compatibility .

And, as Charles Forsyth noted, a workaround was added for when String is required by setting the string property in the schema.

  { "type": "string", "avro.java.string": "String" } 

The default type is the native Utf8 class. In addition to the manual specification and setting up the pom.xml , there is even an option compilation avro-tools, option -string :

 java -jar avro-tools.1.7.5.jar compile -string schema /path/to/schema . 
+6
source

Apparently, Avro uses CharSequence by default. I found a way to configure it to convert to String

Starting with Avro 1.6.0, it is possible for Avro to always convert to String. There are several ways to achieve this. The first is to set the avro.java.string property in the schema to String:

  { "type": "string", "avro.java.string": "String" } 

I have not tested this.

+3
source

Regardless of whether you can force Avro to use String , using CharSequence directly is a poor implementation because CharSequence not Comparable<CharSequence> and does not even indicate the equality of two identical sequences. I suggest filing this as a mistake regarding Avro.

+2
source

quick fix (value type can be other objects, now i am):

 Map<String, String> convertToStringMap(Map<CharSequence, CharSequence> map){ if (null == map){ return null; } HashMap<String, String> result = new HashMap<String, String>(); for(CharSequence key: map.keySet()){ CharSequence k_value = map.get(key); String s_key = key.toString(); String s_value = k_value.toString(); result.put(s_key, s_value); } return result; } 
0
source

I think that explicit conversion of String to Utf8 will work. "some_key" → the new Utf8 ("some_key") and use this as your key for the card.

0
source

Source: https://habr.com/ru/post/957193/


All Articles