Is MapReduce the best way to solve problems in a single-machine multi-core environment with internal memory?

Question

Is MapReduce the best way to solve problems in a single-machine multi-core environment with internal memory?

Is MapReduce abstraction good for problem solving even on one machine? For example, I have a 12-core machine, and I have to count words in thousands of files (a classic MapReduce example).

Using MapReduce implementation with Mappers and Reducers in multiple threads is a good way to solve this problem, considering that we are working on the same machine with one hard drive?

I think my question boils down to the following: Is the MapReduce paradigm good only for working in a cluster of machines?

+6

algorithm concurrency parallel-processing mapreduce

Felipe hummel Jun 24 '11 at 20:08

source share

2 answers

In general, you may have two situations:

Your problem is small enough to fit into the memory of your single system, and your only system has sufficient processor power to solve the problem within the required time.
Your problem is too big. 2.1 Duration of operation is too long (IO disk and / or processor time) 2.2 Too long to fit into memory (RAM).

In 2.1 and 2.2, the MapReduce paradigm helps break down work into many small pieces. If you need more CPUs, you just add processors.

So, if you have one system, and it turns out that your problem is too big to fit into memory (paragraph 2.2), you can still take advantage of the fact that MapReduce can easily put part of the problem on disk until that part for processing.

An important fact is that if you have a problem that is small enough to fit into memory and small enough to process on one system, then a dedicated (non-MapReduce) solution can be much faster.

+7

Niels basjes Jun 25 '11 at 7:10

source share

Kiril · Accepted Answer · 2011-06-24T20:50:34+0000

I think my question boils down to the following: Is the MapReduce paradigm good only for working in a cluster of machines?

Generally yes: MapReduce is likely to be less efficient on a single PC. I cannot think of many (if any) situations that MapReduce will have an advantage over more resource-optimized approaches when they are used in a non-distributed environment (that is, on one PC, on one hard drive). In other words, if you are trying to squeeze every bit of performance on one PC, you will most likely be able to achieve this with a special solution instead of MapReduce.

However, if you plan to add more nodes and create a cluster, then MapReduce will become a transition paradigm.

Is MapReduce the best way to solve problems in a single-machine multi-core environment with internal memory?

More articles: