The main difference is that MapReduce is apparently patentable. (Could not help myself, sorry ...)
In a more serious note, the MapReduce document, as I recall, describes a methodology for performing computations in mass parallelization. This methodology is based on the map / reduce construct, which was well known many years ago, but goes beyond issues such as data dissemination, etc. In addition, some restrictions are placed on the data structure, which is controlled and returned by the functions used in map like and reduce parts of the calculation (as for the data entering the lists of key / value pairs), so you can say that MapReduce is massive parallelism-friendly specialization of the combination of map and reduce .
As for Wikipedia’s comment about the function displayed in the map / reduce functional programming construct that creates one value for input ... Well, of course, yes, but there are no restrictions on the type of the specified value . In particular, it can be a complex data structure, for example, a list of things to which you would again apply the map / reduce transformation. Returning to the example of “word counting”, you may well have a function that for a given part of the text creates a data structure that matches words with the number of occurrences, map that above your documents (or pieces of documents, as may be the case) and reduce results.
In fact, exactly what happens in this article by Phil Hagelberg. This is a funny and extremely brief example of a calculation similar to MapReduce-word-counting implemented in Clojure with map , and something equivalent to reduce (bit (apply + (merge-with ...)) - merge-with implemented in reduce terms in clojure.core). The only difference between this and the Wikipedia example is that the counted objects are URLs instead of arbitrary words - in addition, you have a word count algorithm implemented with map and reduce , MapReduce-style, right there. The reason that he cannot fully qualify as an instance of MapReduce is the lack of a complex distribution of workloads. All this happens on one box ... although on all processors that the box provides.
For an in-depth study of the reduce function, also known as fold , see the Graham Hutton Tutorial on Versatility and Expressiveness . This is based on Haskell, but should be readable even if you don’t know the language, if you want you to take a look at Haskell or two when you go ... Things like ++ = list of concatenations, without deep Haskell Magic .