Reading a simple Avro file from HDFS

I am trying to make a simple read of an Avro file stored in HDFS. I learned how to read it when it is in the local file system ....

FileReader reader = DataFileReader.openReader(new File(filename), new GenericDatumReader()); for (GenericRecord datum : fileReader) { String value = datum.get(1).toString(); System.out.println("value = " value); } reader.close(); 

My file is in HDFS. I can not give openReader path or FSDataInputStream. How can I just read the Avro file in HDFS?

EDIT: I got this to work by creating my own class (SeekableHadoopInput) that implements SeekableInput. I stole it from Ganlion on github. However, it seems that there will be a Hadoop / Avro integration path for this.

thanks

+6
source share
1 answer

The FsInput class (in the avro-mapred submodule, since it depends on Hadoop) can do this. It provides the search input stream required for Avro data files.

 Path path = new Path("/path/on/hdfs"); Configuration config = new Configuration(); // make this your Hadoop env config SeekableInput input = new FsInput(path, config); DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(); FileReader<GenericRecord> fileReader = DataFileReader.openReader(input, reader); for (GenericRecord datum : fileReader) { System.out.println("value = " + datum); } fileReader.close(); // also closes underlying FsInput 
+21
source

Source: https://habr.com/ru/post/921206/


All Articles