Processing header files in Hadoop

Question

Processing header files in Hadoop

I want to process many files in Hadoop - each file has some header information, followed by many records, each of which is stored in a fixed number of bytes. Any suggestions on this?

+3

hadoop

David Rosenberg Jul 9 '09 at 2:43

source share

3 answers

Paolo capriotti · Answer 1 · 2009-07-09T15:04:10+0000

I think the best solution is to create a custom one InputFormat.

Sourav Gulati · Answer 2 · 2013-03-21T12:11:14+0000

There is one solution, you can check the offset of the lines of the files that the cartographer reads. It will be zero for the first line in the file. therefore, you can add a line on the map as follows:

public void map ( LongWritable, , ) throws IOException, InterruptedException {

        if(key.get() > 0)
                       {
                         your mapper code
                       }
              }

, .

, , .

-

phsiao · Answer 3 · 2009-09-20T17:04:42+0000

In addition to writing a custom FileInputFormat, you will also want to make sure that the file is not corrupted, so the reader knows how to handle the entries inside the file.

Processing header files in Hadoop

More articles: