How to create a nested object and an array in a parquet file?

How to create a parquet file with nested fields? I have the following:

public static void main(String []args) throws IOException{ int fileNum = 10; //num of files constructed int fileRecordNum = 50; //record num of each file int rowKey = 0; for(int i = 0; i < fileNum; ++ i ) { Map<String, String> metas = new HashMap<>(); metas.put(HConstants.START_KEY, genRowKey("%10d", rowKey + 1)); metas.put(HConstants.END_KEY, genRowKey("%10d", rowKey + fileRecordNum)); ParquetWriter<Group> writer = initWriter("pfile/scanner_test_file" + i, metas); for (int j = 0; j < fileRecordNum; ++j) { rowKey ++; Group group = sfg.newGroup().append("rowkey", genRowKey("%10d", rowKey)) .append("cf:name", "wangxiaoyi" + rowKey) .append("cf:age", String.format("%10d", rowKey)) .append("cf:job", "student") .append("timestamp", System.currentTimeMillis()); writer.write(group); } writer.close(); } } 

I want to create two fields:

  • Hobbies containing a list of hobbies ("Swimming", "Kickboxing")
  • A teacher object that contains subfields, such as: {"bowl": "Rachel", 'Teacherage': 50}

Can someone provide an example of how to do this in Java?

+6
source share
1 answer

Parquet is a key repository for storing columns (mini-storages) ... Ie this kind of repository cannot store nested data, but this repository accepts the conversion of logical data types to binary format (an array of bytes with a header that contains data to understand what transformation should be applied to this data).

I'm not sure how you should implement your converter, but basically you should work with Binary as a data container and create some kind of converter ... an example converter that you can find for the String data type.

0
source

Source: https://habr.com/ru/post/1014740/


All Articles