Glibc, possible race condition between closing FILE * during exit?

Question

Glibc, possible race condition between closing FILE * during exit?

I overclocked a large program to the code shown below. Running this program in valgrind will eventually report this error:

  == 7234 == Invalid read of size 4
 == 7234 == at 0x34A7275FC8: _IO_file_write@ @ GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x34A7275EA1: new_do_write (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x34A7276D44: _IO_do_write@ @ GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x34A7278DB6: _IO_flush_all_lockp (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x34A7278F07: _IO_cleanup (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x34A7238BBF: __run_exit_handlers (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x34A7238BF4: exit (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x34A722173B: (below main) (in /usr/lib64/libc-2.15.so)
 == 7234 == Address 0x542f2e0 is 0 bytes inside a block of size 568 free'd
 == 7234 == at 0x4A079AE: free (vg_replace_malloc.c: 427)
 == 7234 == by 0x34A726B11C: fclose@ @ GLIBC_2.2.5 (in /usr/lib64/libc-2.15.so)
 == 7234 == by 0x40087C: writer (tc: 22)
 == 7234 == by 0x34A7607D13: start_thread (in /usr/lib64/libpthread-2.15.so)
 == 7234 == by 0x34A72F167C: clone (in /usr/lib64/libc-2.15.so)

From the above output, this is similar to what happens:

main () returns and starts the exit handler to close all FILE *
writer () thread still running wakes up closes FILE *
the exit handler tries to access the closed file *, which is now invalid / free () 'd

As far as I can tell, the test program does nothing undefined, but I would be glad to make a mistake in this.

Valgrind connects to various functions, so it's possible that this is a valgrind error, not glibc.

Is this a glibc error?
Or is it a valgrind error?
Any ideas on how to determine if it is valgrind or glibc?

tc:

#include <stdio.h> #include <stdlib.h> #include <pthread.h> void *test(void *arg) { return NULL; } void *writer(void *arg) { for(;;) { char a[100]; FILE *f = fopen("out", "w"); if(f == NULL) abort(); fputs("Test", f); if(fgets(a, 100, stdin)) fputs(a, f); fclose(f); //line 22 } return NULL; } int main(int argc, char *argv[]) { pthread_t tid1,tid2; pthread_create(&tid1, NULL, writer, NULL); pthread_create(&tid2, NULL, test, NULL); pthread_join(tid2, NULL); //pthread_join(tid1, NULL); //no bug if we wait for writer() return 0; } //compile: gcc tc -g -pthread

It may take several minutes to cause the valgrind error, with:

 while [ true ]; do echo test |valgrind --error-exitcode=2 ./a.out || break done

Environment: Fedora 17, glibc-2.15, gcc-4.7.0-5, kernel 3.5.3-1.fc17.x86_64, valgrind-3.7.0-4

+2

c multithreading linux

nos Sep 23 '12 at 4:20

source share

1 answer

David schwartz · Answer 1 · 2012-09-23T04:38:46+0000

You have a race condition. You have a thread that calls exit , which is documented to close all open stdio threads. Then you have another thread that, possibly after exit closed it, accesses that thread. You cannot access FILE* after it is closed - it is allowed to point to garbage.

If the thread does something that makes the exit call unsafe, you must make sure that you do not call exit . It's really that simple.

Glibc, possible race condition between closing FILE * during exit?

More articles: