How can I implement MapReduce using shell commands?

How do you execute a Unix shell command (for example, awk one liner) in a cluster in parallel (step 1) and collect the results back to the central node (step 2)?

Update: I just found http://blog.last.fm/2009/04/06/mapreduce-bash-script It seems to be doing exactly what I need.

+3
source share
2 answers

If all you are trying to do is knock down a bunch of remote commands, you can just use perl. You can open the ssh command and pass the results back to perl. (Of course, you need to configure the keys for access without a password)

open (REMOTE, "ssh user@hostB \"myScript\"|");
while (<REMOTE>)
{
  print $_;
}

. , .

+2

parallel node .

ssh . (-j - , node). "". ( uniq).

parallel -j 50 ssh {} "ls": host1 host2 hostn | | uniq -c

, "keyless ssh login" node .

, "ls" , , escape-. bashreduce, .

+1

Source: https://habr.com/ru/post/1741480/


All Articles