Can hasoop accept input from multiple directories and files

How do I set fileinputFormat as hadoop input. arg[0]+"/*/*/*" does not match the files.

what i want to read from multiple files:

  Directory1
 --- Directory11
    --- Directory111
         --f1.txt
         --f2.txt
 --- Directory12
 Directory2
 --- Directory21

Is this possible in Hadoop? Thanks!

+6
source share
1 answer

You can enter data from several directories and files using the ***** operator. Most likely, this is because the argument "arg [0]" is incorrect and therefore cannot find the files.

Alternatively, you can also use InputFormat.addInputPath, or if you need separate formats or maps, you can use the MultipleInputs class.

Example of basic path adding

 FileInputFormat.addInputPath(job, myInputPath); 

Here is an example of MultipleInputs

 MultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, MyMapper.class); MultipleInputs.addInputPath(job, inputPath2, TextInputFormat.class, MyOtherMapper.class); 

This other question is also very similar and has good answers, Hadoop for shortening from several input formats .

+3
source

Source: https://habr.com/ru/post/944546/


All Articles