How to write avro for multiple output directories using spark

Hi, There is a topic about writing text data in multiple output directories in one spark job using MultipleTextOutputFormat

Recording on multiple outputs using a Spark key - one Spark job

I would ask if there is a similar way to write avro data to multiple directories

I want to write the data in the avro file to another directory (according to the timestamp field, it is sent to the same directory on the same day in timestamp)

+5
source share
1 answer

The AvroMultipleOutputs class makes it easy to write Avro output to multiple outputs.

  • Case one: write to additional outputs, except for the default output. Each additional output or named output can be configured using its own circuit and output formats.

  • Case two: to write data to different files provided by the user

AvroMultipleOutputs supports counters; they are disabled by default. The counter group is the name of the AvroMultipleOutputs class. The names of the counters match the names of the outputs. They count the number of records recorded for each output name.

Also look

+2
source

Source: https://habr.com/ru/post/1260324/


All Articles