Multithreaded Capture Signals

I have a large program that should be made as elastic as possible and has a large number of threads. I need to catch all the SIGBUS SIGSEGV signals and, if necessary, reinitialize the problem stream or disconnect the stream to continue with reduced functionality.

My first thought is to do setjump , and then install signal handlers that can log the problem, and then do longjump back to the restore point in the stream. The problem is that the signal handler will have to determine which stream the signal is coming from, use the appropriate transition buffer, since jumping back to the wrong stream will be useless.

Does anyone know how to detect an abusive thread in a signal handler?

+6
source share
3 answers

Using syscall(SYS_gettid) works for me in my Linux box: gcc pt.c -lpthread -Wall -Wextra

 //pt.c #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <unistd.h> #include <sys/syscall.h> #include <setjmp.h> #include <signal.h> #include <string.h> #include <ucontext.h> #include <stdlib.h> static sigjmp_buf jmpbuf[65536]; static void handler(int sig, siginfo_t *siginfo, void *context) { //ucontext_t *ucontext = context; pid_t tid = syscall(SYS_gettid); printf("Thread %d in handler, signal %d\n", tid, sig); siglongjmp(jmpbuf[tid], 1); } static void *threadfunc(void *data) { int index, segvindex = *(int *)data; pid_t tid = syscall(SYS_gettid); for(index = 0; index < 500; index++) { if (sigsetjmp(jmpbuf[tid], 1) == 1) { printf("Recovery of thread %d\n", tid); continue; } printf("Thread %d, index %d\n", tid, index); if (index % 5 == segvindex) { printf("%zu\n", strlen((char *)2)); // SIGSEGV } pthread_yield(); } return NULL; } int main(void) { pthread_t thread1, thread2, thread3; int segvindex1 = rand() % 5; int segvindex2 = rand() % 5; int segvindex3 = rand() % 5; struct sigaction sact; memset(&sact, 0, sizeof sact); sact.sa_sigaction = handler; sact.sa_flags = SA_SIGINFO; if (sigaction(SIGSEGV, &sact, NULL) < 0) { perror("sigaction"); return 1; } pthread_create(&thread1, NULL, &threadfunc, (void *) &segvindex1); pthread_create(&thread2, NULL, &threadfunc, (void *) &segvindex2); pthread_create(&thread3, NULL, &threadfunc, (void *) &segvindex3); pthread_join(thread1, NULL); pthread_join(thread2, NULL); pthread_join(thread3, NULL); return 0; } 

To be more portable pthread_self can be used. It is safe for asynchronous signal.

But a stream receiving SIGSEGV must start a new stream using asynchronous signals and should not do siglongjmp , as this can lead to calling functions that do not contain an asynchronous signal.

+2
source

I’m going to assume that you have already thought this through, and you have every reason to believe that your program will be more stable by trying to try again after SIGSEGV - meaning segfaults, there are problems with dangling pointers and others that may also distort unpredictable locations in the address space of your process without segfault.

Since you considered this with extreme caution, and you determined (somehow) that the particular method of segfaults of the application cannot hide the corruption of the accounting data used to cancel and restart the threads, and that you have the perfect cancellation logic for these threads (also unusual rarely), release and solve the problem.

The Linux SIGSEGV handler executes in the failure command stream (signal man 7). We cannot call pthread_self () since it is not safe for an asynchronous signal, but on the Internet it seems to agree that syscall (man 2 syscall) is safe, so we can get the thread ID through syscall SYS_gettid. Therefore, we will support matching pthread_t (pthread_self) with pid (gettid ()). Since write () is also safe, we can block SEGV, write the current thread id down the pipe, and then pause until pthread_cancel completes us.

We also need a monitor flow to keep track of when things go pear-shaped. The monitor thread monitors the end of the read for information about the completed thread and can restart it.

Because I think that applying for SIGSEGV is stupid, I'm going to name the structures here that do daft_thread_t etc. someone_please_fix_me represents your broken code. Monitor flow is the main (). When a segfaults stream, it is captured by the signal handler, writes its identifier down the pipe; the monitor reads the handset, cancels the thread using pthread_cancel and pthread_join and restarts it.

 #include <assert.h> #include <errno.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <sys/syscall.h> #define MAX_DAFT_THREADS (1024) // arbitrary #define CHECK_OSCALL(call, onfail) { \ if ((call) == -1) { \ char buf[512]; \ strerror_r(errno, buf, sizeof(buf)); \ fprintf(stderr, "% s@ %d failed: %s\n", __FILE__, __LINE__, buf); \ onfail; \ } \ } /*********************** daft thread accounting *****************/ typedef void* (*threadproc_t)(void* arg); struct daft_thread_t { threadproc_t start_routine; void* start_routine_arg; pthread_t pthread; pid_t tid; }; struct daft_thread_accounting_info_t { int monitor_pipe[2]; pthread_mutex_t info_lock; size_t daft_thread_count; struct daft_thread_t daft_threads[MAX_DAFT_THREADS]; }; static struct daft_thread_accounting_info_t g_thread_accounting; void daft_thread_accounting_info_init(struct daft_thread_accounting_info_t* inf) { memset(inf, 0, sizeof(*inf)); pthread_mutex_init(&inf->info_lock, NULL); CHECK_OSCALL(pipe(inf->monitor_pipe), abort()); } struct daft_thread_wrapper_data_t { struct daft_thread_t* thread_info; }; static void* daft_thread_wrapper(void* arg) { struct daft_thread_t* wrapper = arg; wrapper->tid = gettid(); return (*wrapper->start_routine)(wrapper->start_routine_arg); } static void start_daft_thread(threadproc_t proc, void* arg) { struct daft_thread_t* info; pthread_mutex_lock(&g_thread_accounting.info_lock); assert (g_thread_accounting.daft_thread_count < MAX_DAFT_THREADS); info = &g_thread_accounting.daft_threads[g_thread_accounting.daft_thread_count++]; pthread_mutex_unlock(&g_thread_accounting.info_lock); info->start_routine = proc; info->start_routine_arg = arg; CHECK_OSCALL(pthread_create(&info->pthread, NULL, daft_thread_wrapper, info), abort()); } static struct daft_thread_t* find_thread_by_tid(pid_t thread_id) { int k; struct daft_thread_t* info = NULL; pthread_mutex_lock(&g_thread_accounting.info_lock); for (k = 0; k < g_thread_accounting.daft_thread_count; ++k) { if (g_thread_accounting.daft_threads[k].tid == thread_id) { info = &g_thread_accounting.daft_threads[k]; break; } } pthread_mutex_unlock(&g_thread_accounting.info_lock); return info; } static void restart_daft_thread(struct daft_thread_t* info) { void* unused; CHECK_OSCALL(pthread_cancel(info->pthread), abort()); CHECK_OSCALL(pthread_join(info->pthread, &unused), abort()); info->tid = 0; CHECK_OSCALL(pthread_create(&info->pthread, NULL, daft_thread_wrapper, info), abort()); } /************* signal handling stuff **************/ struct sigdeath_notify_info { int signum; pid_t tid; }; static void sigdeath_handler(int signum, siginfo_t* info, void* ctx) { int z; struct sigdeath_notify_info inf = { .signum = signum, .tid = gettid() }; z = write(g_thread_accounting.monitor_pipe[1], &inf, sizeof(inf)); assert (z == sizeof(inf)); // or else SIGABRT. Are we handling that too? Hope not. pause(); // returning doesn't do us any good. } static void register_signal_handlers() { struct sigaction sa = {}; sa.sa_sigaction = sigdeath_handler; sa.sa_flags = SA_SIGINFO; CHECK_OSCALL(sigaction(SIGSEGV, &sa, NULL), abort()); CHECK_OSCALL(sigaction(SIGBUS, &sa, NULL), abort()); } pid_t gettid() { return (pid_t) syscall(SYS_gettid); } /** This is the code that segfaults randomly. Kwality with a 'k'. */ static void* someone_please_fix_me(void* arg) { char* i_think_this_address_looks_nice = (char*) 42; sleep(1 + rand() % 200); i_think_this_address_looks_nice[0] = 'q'; // ugh return NULL; } // main() will serve as the monitor thread here int main() { int k; struct sigdeath_notify_info death; daft_thread_accounting_info_init(&g_thread_accounting); register_signal_handlers(); for (k = 0; k < 200; ++k) { start_daft_thread(someone_please_fix_me, (void*) k); } while (read(g_thread_accounting.monitor_pipe[0], &death, sizeof(death)) == sizeof(death)) { struct daft_thread_t* info = find_thread_by_tid(death.tid); if (info == NULL) { fprintf(stderr, "*** thread_id %u not found\n", death.tid); continue; } fprintf(stderr, "Thread %u (%d) died of %d, restarting.\n", death.tid, (int) info->start_routine_arg, death.signum); restart_daft_thread(info); } fprintf(stderr, "Shouldn't get here.\n"); return 0; } 

If you have not thought about this: Attempting to recover from SIGSEGV is extremely risky - I am categorically against this. Themes share the address space. A thread that may be damaged can also corrupt data from other threads or global accounting data, such as malloc () accounting. A safer approach — provided that the faulty code is irreparably broken but must be used — is to quarantine the faulty code outside the process, such as fork (), before invoking the broken code. Then you have to catch SIGCLD and handle the process that usually crashed or ended, along with a number of other pitfalls, but at least you don't need to worry about accidental corruption. Of course, the best option is to fix the bloody code so that you do not observe segfaults.

+4
source

In my experience, when a streaming program receives a synchronous signal, i.e. one that is generated by something that the program was executing, for example dereferencing a bad pointer - the thread that caused the problem receives a signal.

I used one system that explicitly guaranteed this behavior, but I do not know if it is common. In addition, of course, if the offensive stream blocked the signal, as in the paradigm where one stream processes all the signals, it will apparently go to the signal processing stream.

0
source

Source: https://habr.com/ru/post/988210/


All Articles