Concat Avro Files Using avro-tools

I am trying to merge avro files into one big file, the problem in concat command concat not accept wildcard

 hadoop jar avro-tools.jar concat /input/part* /output/bigfile.avro 

I get:

Exception in the stream "main" java.io.FileNotFoundException: The file does not exist: / input / part *

I tried to use the "" and '' , but no chance.

+5
source share
1 answer

I quickly checked the source code of Avro (1.7.7) and it seems that concat does not support glob templates (basically, they call FileSystem.open() for every argument except the last one).

This means that you must explicitly specify all file names as an argument. This is cumbersome, but the following command should do what you want:

 IN=$(hadoop fs -ls /input/part* | awk '{printf "%s ", $NF}') hadoop jar avro-tools.jar concat ${IN} /output/bigfile.avro 

It would be nice to add glob template support for this command.

+7
source

Source: https://habr.com/ru/post/1240854/


All Articles