Hmm .. there is confusion here, or let me create confusion here.
Suppose you have a problem that can be solved, say, O(n) complexity , which will make haop, if you apply, let's assume machines K , then it will reduce complexity by K times. Therefore, in your case, the task should be faster (hadoop task).
WHAT GOES WRONG ?????
Assuming you have a standard hadoop installation and all the standard hadoop configuration, as well as if you use hadoop in local mode by default.
1) You run the program on one node, so you do not expect the runtime of everything that is less than the standard. (The case would be different if you used a multi node cluster)
Now the question arises, since one machine is used, the operating time should be the same.
Answer: no, in hadoop the data is first read by the reader of the records, which emits pairs of key values, which are transmitted to the handler, which then processes and emits pairs of key values (if the combiner is not used), then the data is sorted and shuffled, and then transferred to the reducer phase and then written to hdfs. Therefore, see here a lot more overhead. Because of these reasons, you experience a decline in performance.
You want to see what a house can do. Run the same task in the K node cluster and take the bytes of peta data, and also run the single-threaded application. I promise that you will be amazed.
source share