Here is some moderately generic but simple code for executing pipelines, a program that I call pipeline . This is SSCCE in one file provided, although I would have stderr.h and stderr.c files as separate files in the library that would be associated with all my programs. (Actually, I have a more complex set of functions in my "real" stderr.c and stderr.h , but this is a good starting point.)
The code works in two ways. If there are no arguments, then it starts the built-in pipeline:
who | awk '{print $1}' | sort | uniq -c | sort -n
This counts how many times each person registered in the system, presenting the list in order of increasing number of sessions. Alternatively, you can invoke a sequence of arguments that you want to invoke from the command line, use the quoted channel '|' (or "|" ) to separate commands:
Really:
pipeline pipeline ls '|' wc pipeline who '|' awk '{print $1}' '|' sort '|' uniq -c '|' sort -n pipeline ls
Invalid:
pipeline '|' wc -l pipeline ls '|' '|' wc -l pipeline ls '|' wc -l '|'
The last three calls force pipes as separators. The code does not check for errors on every system call; it checks for fork() , execvp() and pipe() , but skips checking for dup2() and close() . It does not include diagnostic printing for generated commands; A -x version of the pipeline would be a reasonable addition, forcing him to print a trace of what he is doing. It also does not exit with the exit status of the last command in the pipeline.
Notice that the code begins with a fork of the child. The child will be the last process in the pipeline, but first creates a channel and creates another process to start earlier processes in the pipeline. Mutual recursive functions are unlikely to be the only way to sort, but they leave minimal code repetition (earlier draft code had the contents of exec_nth_command() , which was pretty much repeated in exec_pipeline() and exec_pipe_command() ).
The structure of the process here is such that the original process knows only about the last process in the pipeline. You can redesign things so that the source process is the parent of each process in the pipeline, so the source process can separately report the status of each command in the pipeline. I have not modified the code for this structure yet; it will be a little more complicated, although not disgusting.
/* One way to create a pipeline of N processes */ /* stderr.h */ #ifndef STDERR_H_INCLUDED #define STDERR_H_INCLUDED static void err_setarg0(const char *argv0); static void err_sysexit(char const *fmt, ...); static void err_syswarn(char const *fmt, ...); #endif /* STDERR_H_INCLUDED */ /* pipeline.c */ #include <assert.h> #include <stdio.h> #include <string.h> #include <sys/wait.h> #include <unistd.h> /*#include "stderr.h"*/ typedef int Pipe[2]; /* exec_nth_command() and exec_pipe_command() are mutually recursive */ static void exec_pipe_command(int ncmds, char ***cmds, Pipe output); /* With the standard output plumbing sorted, execute Nth command */ static void exec_nth_command(int ncmds, char ***cmds) { assert(ncmds >= 1); if (ncmds > 1) { pid_t pid; Pipe input; if (pipe(input) != 0) err_sysexit("Failed to create pipe"); if ((pid = fork()) < 0) err_sysexit("Failed to fork"); if (pid == 0) { /* Child */ exec_pipe_command(ncmds-1, cmds, input); } /* Fix standard input to read end of pipe */ dup2(input[0], 0); close(input[0]); close(input[1]); } execvp(cmds[ncmds-1][0], cmds[ncmds-1]); err_sysexit("Failed to exec %s", cmds[ncmds-1][0]); /*NOTREACHED*/ } /* Given pipe, plumb it to standard output, then execute Nth command */ static void exec_pipe_command(int ncmds, char ***cmds, Pipe output) { assert(ncmds >= 1); /* Fix stdout to write end of pipe */ dup2(output[1], 1); close(output[0]); close(output[1]); exec_nth_command(ncmds, cmds); } /* Execute the N commands in the pipeline */ static void exec_pipeline(int ncmds, char ***cmds) { assert(ncmds >= 1); pid_t pid; if ((pid = fork()) < 0) err_syswarn("Failed to fork"); if (pid != 0) return; exec_nth_command(ncmds, cmds); } /* Collect dead children until there are none left */ static void corpse_collector(void) { pid_t parent = getpid(); pid_t corpse; int status; while ((corpse = waitpid(0, &status, 0)) != -1) { fprintf(stderr, "%d: child %d status 0x%.4X\n", (int)parent, (int)corpse, status); } } /* who | awk '{print $1}' | sort | uniq -c | sort -n */ static char *cmd0[] = { "who", 0 }; static char *cmd1[] = { "awk", "{print $1}", 0 }; static char *cmd2[] = { "sort", 0 }; static char *cmd3[] = { "uniq", "-c", 0 }; static char *cmd4[] = { "sort", "-n", 0 }; static char **cmds[] = { cmd0, cmd1, cmd2, cmd3, cmd4 }; static int ncmds = sizeof(cmds) / sizeof(cmds[0]); static void exec_arguments(int argc, char **argv) { /* Split the command line into sequences of arguments */ /* Break at pipe symbols as arguments on their own */ char **cmdv[argc/2]; // Way too many char *args[argc+1]; int cmdn = 0; int argn = 0; cmdv[cmdn++] = &args[argn]; for (int i = 1; i < argc; i++) { char *arg = argv[i]; if (strcmp(arg, "|") == 0) { if (i == 1) err_sysexit("Syntax error: pipe before any command"); if (args[argn-1] == 0) err_sysexit("Syntax error: two pipes with no command between"); arg = 0; } args[argn++] = arg; if (arg == 0) cmdv[cmdn++] = &args[argn]; } if (args[argn-1] == 0) err_sysexit("Syntax error: pipe with no command following"); args[argn] = 0; exec_pipeline(cmdn, cmdv); } int main(int argc, char **argv) { err_setarg0(argv[0]); if (argc == 1) { /* Run the built in pipe-line */ exec_pipeline(ncmds, cmds); } else { /* Run command line specified by user */ exec_arguments(argc, argv); } corpse_collector(); return(0); } /* stderr.c */ /*#include "stderr.h"*/ #include <stdio.h> #include <stdarg.h> #include <errno.h> #include <string.h> #include <stdlib.h> static const char *arg0 = "<undefined>"; static void err_setarg0(const char *argv0) { arg0 = argv0; } static void err_vsyswarn(char const *fmt, va_list args) { int errnum = errno; fprintf(stderr, "%s:%d: ", arg0, (int)getpid()); vfprintf(stderr, fmt, args); if (errnum != 0) fprintf(stderr, " (%d: %s)", errnum, strerror(errnum)); putc('\n', stderr); } static void err_syswarn(char const *fmt, ...) { va_list args; va_start(args, fmt); err_vsyswarn(fmt, args); va_end(args); } static void err_sysexit(char const *fmt, ...) { va_list args; va_start(args, fmt); err_vsyswarn(fmt, args); va_end(args); exit(1); }
Signals and SIGCHLD
The POSIX Signal Concepts section discusses SIGCHLD:
In SIG_DFL:
If the default action is to ignore the signal, the signal delivery should not affect the process.
In the SIG_IGN section:
If the SIGCHLD signal is set to SIG_IGN, the child processes of the calling processes should not be converted to zombie processes when they are completed. If the calling process subsequently waits for its children, and the process does not have children that would not be reprogrammed for zombie processes, it blocks until all its children have finished, and wait () , waitid (), and waitpid () should be erroneous and set errno to [ECHILD] .
Description <signal.h> contains the default table for signals, and for SIGCHLD, by default, I (SIG_IGN).
I added another function to the code above:
#include <signal.h> typedef void (*SigHandler)(int signum); static void sigchld_status(void) { const char *handling = "Handler"; SigHandler sigchld = signal(SIGCHLD, SIG_IGN); signal(SIGCHLD, sigchld); if (sigchld == SIG_IGN) handling = "Ignored"; else if (sigchld == SIG_DFL) handling = "Default"; printf("SIGCHLD set to %s\n", handling); }
I called it immediately after calling err_setarg0() , and it reports "By default" on both Mac OS X 10.7.5 and Linux (RHEL 5, x86 / 64). I confirmed its work by doing:
(trap '' CHLD; pipeline)
On both platforms that reported "Ignored" and the pipeline team no longer reported exit status of the child; it didnβt work out.
So, if a program ignores SIGCHLD, it does not generate any zombies, but waits until "all" of its children are over. That is, until all of his direct children stop; the process cannot wait for its grandchildren or more distant offspring, neither its brothers and sisters, nor its ancestors.
On the other hand, if the parameter for SIGCHLD is the default, the signal is ignored and zombies are created.
This is the most convenient behavior for this program, as written. The corpse_collector() function has a loop that collects status information from any children. There is only one child at a time with this code; the rest of the pipeline starts as a child (child, child, ...) from the last process in the pipeline.
However, I have problems with zombies / corpses. My teacher let me implement it the same way you did, since cmd1 not the parent of cmd2 in the case: " cmd1 | cmd2 | cmd3 ". If I do not tell my shell to wait for each process ( cmd1 , cmd2 and cmd3 ), and not just wait for the last process ( cmd3 ), the entire pipeline is completed before the exit can reach the end. I am having trouble finding a good way to wait; my teacher said to use WNOHANG.
I'm not sure I understand the problem. With the code I provided, cmd3 is the parent of cmd2 , and cmd2 is the parent of cmd1 in the 3-command pipeline (and the shell is the parent of cmd3 ), so the shell can only wait for cmd3 . I originally stated:
The structure of the process here is such that the original process knows only about the last process in the pipeline. You can redesign things so that the source process is the parent of each process in the pipeline, so the source process can separately report the status of each command in the pipeline. I have not modified the code for this structure yet; it will be a little more complicated, although not disgusting.
If you have a shell that can wait for all three commands in the pipeline, you should use an alternative organization.
The description of waitpid() includes:
The pid argument specifies the set of child processes for which status is requested. The waitpid () function should only return the status of the child process from this set:
If pid is (pid_t) -1, status is requested for any child process. In this regard, waitpid () is equivalent to wait ().
If pid is greater than 0, it indicates the process identifier for one child process for which status is requested.
If pid is 0, a status is requested for any child process whose process group identifier is equal to the identifier of the calling process.
If pid is less than (pid_t) -1, a status is requested for any child process whose process group identifier is equal to the absolute value of pid.
The options argument is created from a bitwise inclusive OR from zero or more of the following flags defined in the header:
...
WNOHANG The waitpid() function should not suspend the execution of the calling thread if the status is not immediately available for one of the child processes specified by pid.
...
This means that if you use process groups, and the shell knows in which process group the pipeline works (for example, since the pipeline is placed in its process group by the first process), then the parent can wait for the children to stop.
... rambling ... I think there is useful information here; there probably should be more than what I write, but my mind is gone.