Running Hadoop MapReduce, is it possible to call external executables outside of HDFS

Inside my mapper, I would like to call external software installed on a working node outside of HDFS. Is it possible? What is the best way to do this?

I understand that this can lead to some advantages / scalability of MapReduce, but I would like to interact both inside HDFS and call compiled / installed external program codes in my cartographer to process some data.

+6
source share
2 answers

Mappers (and reducers) are like any other process on a block, if the TaskTracker user has permission to run the executable file, there is no problem with that. There are several ways to call external processes, but since we are already in Java, ProcessBuilder seems like a logical place to start.

EDIT: Just found that Hadoop explicitly has a class for this purpose: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Shell.html

+5
source

This is certainly doable. Best to work with Hadoop Streaming . As stated on this website:

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run map / reduce jobs with any executable file or script as a converter and / or reducer.

I usually start with external code inside Hadoop Streaming. Depending on your language, there are many good examples of how to use it in streaming; as soon as you get to your language of choice, you can usually transfer data to another program, if necessary. I had several levels of programs in different languages โ€‹โ€‹that played beautifully without any extra effort than if I ran them on a regular Linux box, except that the outer layer worked with Hadoop Streaming.

0
source

Source: https://habr.com/ru/post/896535/


All Articles