Check out the MultipleInputs class that solves this problem. This is pretty neat - you go into InputFormat and not necessarily in the Mapper class.
If you are looking for google code examples, find โSide Reduction Connectionโ, where this method is commonly used.
On the other hand, sometimes itโs easier for me to just use a hack. For example, if you have one set of files with a space separator and the other with an underscore separator, load both with the same mapper and TextInputFormat and check both possible separators. Count the number of tokens from two results. In the example of word counting, select one of them with more tokens.
This also works if both files are the same separator but have a different number of standard columns. You can tokenize the comma, and then see how many tokens there are. If it is 5 tokens, this is from dataset A, if it is 7 tokens, this is from dataset B.
source share