Processing header files in Hadoop

I want to process many files in Hadoop - each file has some header information, followed by many records, each of which is stored in a fixed number of bytes. Any suggestions on this?

+3
source share
3 answers

I think the best solution is to create a custom one InputFormat.

+4
source

There is one solution, you can check the offset of the lines of the files that the cartographer reads. It will be zero for the first line in the file. therefore, you can add a line on the map as follows:

public void map ( LongWritable, , ) throws IOException, InterruptedException       {

        if(key.get() > 0)
                       {
                         your mapper code
                       }
              }

, .

, , .

-

+1

In addition to writing a custom FileInputFormat, you will also want to make sure that the file is not corrupted, so the reader knows how to handle the entries inside the file.

0
source

Source: https://habr.com/ru/post/1712210/


All Articles