Search for a deadlock scenario guide

I have a program that spawns many children and works for long periods of time. The program contains a SIGCHLD handler for collecting idle processes. Sometimes this program freezes. I believe pstack points to a deadlock scenario. Is this the correct interpretation of this conclusion?

10533: ./asyncsignalhandler ff3954e4 lwp_park (0, 0, 0) ff391bbc slow_lock (ff341688, ff350000, 0, 0, 0, 0) + 58 ff2c45c8 localtime_r (ffbfe7a0, 0, 0, 0, 0, 0) + 24 ff2ba39c __posix_ctime_r (ffbfe7a0, ffbfe80e, ffbfe7a0, 0, 0, 0) + c 00010bd8 gettimestamp (ffbfe80e, ffbfe828, 40, 0, 0, 0) + 18 00010c50 sig_chld (12, 0, ffbfe9f0, 0, 0, 0) + 30 ff3956fc __sighndlr (12, 0, ffbfe9f0, 10c20, 0, 0) + c ff38f354 call_user_handler (12, 0, ffbfe9f0, 0, 0, 0) + 234 ff38f504 sigacthandler (12, 0, ffbfe9f0, 0, 0, 0) + 64 --- called from signal handler with signal 18 (SIGCLD) --- ff391c14 pthread_mutex_lock (20fc8, 0, 0, 0, 0, 0) + 48 ff2bcdec getenv (ff32a9ac, 770d0, 0, 0, 0, 0) + 1c ff2c6f40 getsystemTZ (0, 79268, 0, 0, 0, 0) + 14 ff2c4da8 ltzset_u (4ede65ba, 0, 0, 0, 0, 0) + 14 ff2c45d0 localtime_r (ffbff378, 0, 0, 0, 0, 0) + 2c ff2ba39c __posix_ctime_r (ffbff378, ffbff402, ffbff378, ff33e000, 0, 0) + c 00010bd8 gettimestamp (ffbff402, ffbff402, 2925, 29a7, 79c38, 10b54) + 18 00010ae0 main (1, ffbff4ac, ffbff4b4, 20c00, 0, 0) + 190 00010928 _start (0, 0, 0, 0, 0, 0) + 108 

I really can not imagine C coder and am not familiar with the nuances of the language. I specifically use the re-version of ctime (_r) in the program. Why is this still slowing down?

 #include <stdio.h> #include <stdlib.h> #include <string.h> #include <time.h> // import pid_t type #include <sys/types.h> // import _exit function #include <unistd.h> // import WNOHANG definition #include <sys/wait.h> // import errno variable #include <errno.h> // header for signal functions #include <signal.h> // function prototypes void sig_chld(int); char * gettimestamp(char *); // begin int main(int argc, char **argv) { time_t sleepstart; time_t sleepcheck; pid_t childpid; int i; unsigned int sleeptime; char sleepcommand[20]; char ctime_buf[26]; struct sigaction act; /* set stdout to line buffered for logging purposes */ setvbuf(stdout, NULL, _IOLBF, BUFSIZ); /* Assign sig_chld as our SIGCHLD handler */ act.sa_handler = sig_chld; /* We don't want to block any other signals */ sigemptyset(&act.sa_mask); /* * We're only interested in children that have terminated, not ones * which have been stopped (eg user pressing control-Z at terminal) */ act.sa_flags = SA_NOCLDSTOP; /* Make these values effective. */ if (sigaction(SIGCHLD, &act, NULL) < 0) { printf("sigaction failed\n"); return 1; } while (1) { for (i = 0; i < 20; i++) { /* fork/exec child program */ childpid = fork(); if (childpid == 0) // child { //sleeptime = 30 + i; sprintf(sleepcommand, "sleep %d", i); printf("\t[%s][%d] Executing /bin/sh -c %s\n", gettimestamp(ctime_buf), getpid(), sleepcommand); execl("/bin/sh", "/bin/sh", "-c", sleepcommand, NULL); // only executed if exec fails printf("[%s][%d] Error executing program, errno: %d\n", gettimestamp(ctime_buf), getpid(), errno); _exit(1); } else if (childpid < 0) // error { printf("[%s][%d] Error forking, errno: %d\n", gettimestamp(ctime_buf), getpid(), errno); } else // parent { printf("[%s][%d] Spawned child, pid: %d\n", gettimestamp(ctime_buf), getpid(), childpid); } } // sleep is interrupted by SIGCHLD, so we can't simply sleep(5) printf("[%s][%d] Sleeping for 5 seconds\n", gettimestamp(ctime_buf), getpid()); time(&sleepstart); while (1) { time(&sleepcheck); if (difftime(sleepcheck, sleepstart) < 5) { sleep(1); } else { break; } } } return(0); } char * gettimestamp(char *ctime_buf) { time_t now; time(&now); // format the timestamp and chomp the newline ctime_r(&now, ctime_buf); ctime_buf[strlen(ctime_buf) - 1] = '\0'; return ctime_buf; } /* * The signal handler function -- only gets called when a SIGCHLD * is received, ie when a child terminates. */ void sig_chld(int signo) { pid_t childpid; int childexitstatus; char ctime_buf[26]; while (1) { childpid = waitpid(-1, &childexitstatus, WNOHANG); if (childpid > 0) printf("[%s][%d] Reaped child, pid: %d, exitstatus: %d\n", gettimestamp(ctime_buf), getpid(), childpid, WEXITSTATUS(childexitstatus)); else return; } } 

I work in a Solaris 9 environment. The program was compiled with the Sun WorkShop 6 2 C 5.3 Patch 111679-15 2009/09/10 update using the following syntax:

 cc -o asyncsignalhandler asyncsignalhandler.c -mt -D_POSIX_PTHREAD_SEMANTICS 

Is there a flaw in the program? Are there more efficient ways to handle registration (with timestamps) from a signal handler?

0
source share
1 answer

You call functions that are not safe for an asynchronous signal (see section 2.4.3 of the unix specification) from the signal handler - in this case ctime_r() and printf() (it seems that the deadlock is due to the lock used by ctime_r() in stack shown). These functions can take locks, and since the signal handler can be called at any time, the lock can already be held, which leads to a deadlock.

As a rule, in the signal handler all you need to do is make a note for the main stream, which will be discussed later. For example, you could write() (which is a function safe for asynchronous access) to a pipe() -created file descriptor and have your main loop (or other thread) executing a selection loop to wait for some data to appear on this pipe.

Also note that thread safety is not the same as being safe for an asynchronous signal. ctime_r is thread safe - it doesn’t step on each other to fix threads; it uses blocked buffers, not a static buffer. But this is not safe for an asynchronous signal, because it cannot tolerate re-assignment at any arbitrary point in its execution.

+3
source

Source: https://habr.com/ru/post/1442531/


All Articles