The difference between MapReduce and a map reduction combination in functional programming

I read mapreduce at http://en.wikipedia.org/wiki/MapReduce , I understood an example of how to get the word count in many “documents.” However, I did not understand the following line:

Thus, the MapReduce framework transforms a list of pairs (key, value) into a list of values. This behavior is different from the functional programming map and reduces the combination that takes a list of arbitrary values ​​and returns a single value that combines all the values ​​returned by the map.

Can someone clarify this difference again (MapReduce VS map and reduce the combination)? In particular, what does reducing functional programming do?

Many thanks.

+4
source share
3 answers

The main difference is that MapReduce is apparently patentable. (Could not help myself, sorry ...)

In a more serious note, the MapReduce document, as I recall, describes a methodology for performing computations in mass parallelization. This methodology is based on the map / reduce construct, which was well known many years ago, but goes beyond issues such as data dissemination, etc. In addition, some restrictions are placed on the data structure, which is controlled and returned by the functions used in map like and reduce parts of the calculation (as for the data entering the lists of key / value pairs), so you can say that MapReduce is massive parallelism-friendly specialization of the combination of map and reduce .

As for Wikipedia’s comment about the function displayed in the map / reduce functional programming construct that creates one value for input ... Well, of course, yes, but there are no restrictions on the type of the specified value . In particular, it can be a complex data structure, for example, a list of things to which you would again apply the map / reduce transformation. Returning to the example of “word counting”, you may well have a function that for a given part of the text creates a data structure that matches words with the number of occurrences, map that above your documents (or pieces of documents, as may be the case) and reduce results.

In fact, exactly what happens in this article by Phil Hagelberg. This is a funny and extremely brief example of a calculation similar to MapReduce-word-counting implemented in Clojure with map , and something equivalent to reduce (bit (apply + (merge-with ...)) - merge-with implemented in reduce terms in clojure.core). The only difference between this and the Wikipedia example is that the counted objects are URLs instead of arbitrary words - in addition, you have a word count algorithm implemented with map and reduce , MapReduce-style, right there. The reason that he cannot fully qualify as an instance of MapReduce is the lack of a complex distribution of workloads. All this happens on one box ... although on all processors that the box provides.

For an in-depth study of the reduce function, also known as fold , see the Graham Hutton Tutorial on Versatility and Expressiveness . This is based on Haskell, but should be readable even if you don’t know the language, if you want you to take a look at Haskell or two when you go ... Things like ++ = list of concatenations, without deep Haskell Magic .

+8
source

Using an example of word counting, the original functional map() will accept a set of documents, it is not necessary to distribute subsets of this set, and for each document, select one value representing the number of words (or individual words) in the document. Functional reduce() then compose global counts for all documents, one for each document . Thus, you get the total amount (any word or a specific word).

In MapReduce, a map will generate a pair (word, score) for each word in each document . Then MapReduce reduce() will add the amount of each word in each document without mixing them together. Thus, you get a list of words in combination with their number.

+1
source

MapReduce is a structure based on splitting computations into parallelizable cartographers and reducers. It is based on the usual map idiom and reduces - if you can structure your tasks in such a way that they can be performed by independent cartographers and reducers, you can write it in such a way as to use the MapReduce framework.

Imagine a Python interpreter that recognizes tasks that could be independently calculated and processes them on mapper or gear nodes. If you wrote

 reduce(lambda x, y: x+y, map(int, ['1', '2', '3'])) 

or

 sum([int(x) for x in ['1', '2', '3']]) 

you would use a functional map and reduce the methods in the MapReduce framework. With existing MapReduce frames, much more plumbing is involved, but that's the same concept.

+1
source

Source: https://habr.com/ru/post/1299176/


All Articles