How to combine multiple Hadoop MapReduce Jobs into one?

I have a huge amount of input (so I use Hadoop), and there are several tasks that can be solved with the help of various MapReduce steps, for which the first cartographer needs all the data as input.

My goal: Calculate these various tasks as quickly as possible.

Currently, I allow them to run each read sequentially in all data. I assume that it will be faster when combining tasks and performing similar parts (for example, feeding all the data to the display device) only once.

I was wondering how and how I can combine these tasks. For each pair of input keys / values, the converter can emit a "super key", which includes the task identifier and data for specific tasks, as well as the value. Thus, reducers will receive key / value pairs for a task and a key intended for a specific task, and can decide when they see a “super click” which task should be performed on the included key and values.

In pseudo code:

map(key, value):
    emit(SuperKey("Task 1", IncludedKey), value)
    emit(SuperKey("Task 2", AnotherIncludedKey), value)

reduce(key, values):
   if key.taskid == "Task 1":
      for value in values:
          // do stuff with key.includedkey and value
   else:
      // do something else

The key may be WritableComparable, which may contain all the necessary information.

Note: pseudo-code offers terrible architecture, and this can be done more reasonably.

My questions:

  • Is this a smart approach?
  • Are there any better alternatives?
  • Does he have some terrible flaw?
  • Do I need a custom class Partitionerfor this approach?

. RDF, - , . Hadoop Counters , MapReduce.

Amazon Elastic MapReduce. .

+3
3
  • ?

, . , -, , ( ).

  • ?

--Mapper Reducer, , . (, ).

  • - ?

, , , , node ; , , , , . .

  • Partitioner ?

, , . , , WritableComparable, . , Partitioner, (, KeyFieldBasedPartitioner, Text String , ).

. , , . !

+2

:

  • Oozie

hadoop.

+2

, - . , , node node. - , , u , , .

http://www.infoq.com/articles/introductionOozie

0
source

Source: https://habr.com/ru/post/1752322/


All Articles