You should read more about the basic concepts of MapReduce. Although sorting may not be necessary in some cases, the shuffled portion of the Shuffle and Sort phase is an integral part of the MapReduce model. The MapReduce (Hadoop) structure must group the output of the cartographers so that they transmit all the keys together to one reducer, so that the reducer can actually "reduce" the data. When streaming, pairs of key values — by default — are separated by a tab value. From your code example in other SO questions, I see that you are not providing key, value tuples, but just separate text strings.
EDIT: added the following answer to the question "How to make it sortable numerically (for example, 9 to 10)?"
Alternative 1: Prepare zeros for your keys so that they all have the same size. "09" to "10".
Alternative 2: use KeyFieldBasedComparator
as pointed out in this SO question .
source share