GNU Parallel - what work failed?

I do the job on several different servers (up to 25) using the GNU parallel.

A shell script that implements this currently:

parallel --tag --nonall -S $some_list_of_servers "some_command" state=$? echo -n "RESULT: " if [ "$state" -eq "0" ] then echo "All jobs successful" else echo "$state jobs failed" fi return $state 

where some_list_of_servers is an array, and install_command is, for example, git fetch.

What I want is LOT more info than just the number of failed jobs. I want to know which command and which server failed.

I went through the man page, google and SO, but cannot find the switches I'm looking for.

Any help greatly appreciated.

Weedom

EDIT in response to answer 1:

I tried this and something strange is happening.

 weedom@host1 : ~/$ parallel --tag --nonall -j8 --joblog test.log -S host1,host2 uptime host2 10:41:17 up 36 days, 20:45, 1 user, load average: 0.00, 0.00, 0.00 host1 10:41:17 up 22:34, 3 users, load average: 0.06, 0.11, 0.04 weedom@host1 : ~/$ cat test.log Seq Host Starttime Runtime Send Receive Exitval Signal Command 1 host1 1403689277.067 0.519999980926514 0 0 0 0 uptime 

No matter how many hosts I add to -S, it seems to me that only the last of them ends in test.log

I added the following question: GNU Parallel - --joblog only logs the last job

+6
source share
1 answer

You want to use the --joblog , as shown in the docs. The Gnu option even allows you to restart only failed ones with --resume-failed .

for example, running this script:

 #!/bin/bash jobmod=$(( $1 % 3 )) if [ $jobmod == 0 ] then exit 1 else exit 0 fi 

on multiple hosts:

 $ seq 1 10 | parallel --joblog out.log -S "srv01,srv02,srv03,srv04" ./failjob 

gives

 $ more out.log Seq Host Starttime Runtime Send Receive Exitval Signal Command 1 srv01 1403542514.713 0.267 0 0 0 0 ./failjob 1 3 srv02 1403542514.717 0.266 0 0 1 0 ./failjob 3 4 srv03 1403542514.719 0.266 0 0 0 0 ./failjob 4 2 srv04 1403542514.715 0.397 0 0 0 0 ./failjob 2 5 srv01 1403542514.983 0.231 0 0 0 0 ./failjob 5 6 srv02 1403542514.986 0.368 0 0 1 0 ./failjob 6 7 srv03 1403542514.988 0.388 0 0 0 0 ./failjob 7 8 srv04 1403542515.121 0.437 0 0 0 0 ./failjob 8 9 srv01 1403542515.221 0.343 0 0 1 0 ./failjob 9 10 srv02 1403542515.356 0.388 0 0 0 0 ./failjob 10 
+5
source

Source: https://habr.com/ru/post/971244/


All Articles