Recently, in my work, I want to implement a counter in a multi-threaded program. I found that in my GCC (3.4.5) there is a user space data type called atomic_t . But this, apparently, is not atomic.
I tested atomic_inc () / atomic_read () on an x86_64 machine with 12 cores, and the linux kernel is 2.6.9.
This is a demonstration. I add pthread_cond_t and pthread_cond_broadcast to increase the degree of concurrency.
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <stdint.h> #include <pthread.h> #include <asm/atomic.h> atomic_t atomic_count; pthread_cond_t g_cond; pthread_mutex_t g_mutex; void* thread_func(void*p) { pthread_mutex_t* lock = (pthread_mutex_t*)p; pthread_mutex_lock(lock); pthread_cond_wait(&g_cond, lock); pthread_mutex_unlock(lock); for (int i=0;i<20000;++i){ atomic_inc(&atomic_count); } return NULL; } #define THRD_NUM 15 int main() { atomic_set(&atomic_count, 0); pthread_cond_init(&g_cond, NULL); pthread_mutex_init(&g_mutex, NULL); pthread_t pid[THRD_NUM]; for (int i=0; i<THRD_NUM; i++) { pthread_create(&pid[i], NULL, thread_func, &g_mutex); } sleep(3); pthread_cond_broadcast(&g_cond); for (int i=0; i<THRD_NUM; i++) { pthread_join(pid[i], NULL); } long ans = atomic_read(&atomic_count); printf("atomic_count:%ld \n", ans); }
The expected result is 300,000, but we always get 270,000+ or ββ280,000+ instead. I found an implementation of atomic_inc ()
static __inline__ void atomic_inc(atomic_t *v) { __asm__ __volatile__( LOCK "incl %0" :"=m" (v->counter) :"m" (v->counter)); }
According to Intel, the LOCK prefix has the semantics of a complete barrier. Does this mean that the program output has nothing to do with the reordering of commands?
Moreover, I found an interesting phenomenon. If I set THRD_NUM to less than 12 (my machine core number), the error will be less. I think this could be caused by context switching. But I have no idea how this happened. Can anybody help me? Thanks!