Measuring Mutex and Busy Standby Performance

The program is designed to create several threads in which each thread increases the total variable by 10000, using a for loop, which increases it by 1 at each iteration. Both a mutex lock lock and a lock lock (wait) are required. According to what I found out, the mutex version should be faster than direct locking. But what I implemented gave me the opposite answer ...

This is the implementation of each thread in the mutex version:

void *incr(void *tid) { int i; for(i = 0; i < 10000; i++) { pthread_mutex_lock(&the_mutex); //Grab the lock sharedVar++; //Increment the shared variable pthread_mutex_unlock(&the_mutex); //Release the lock } pthread_exit(0); } 

And this is the implementation in the spin lock version:

 void *incr(void *tid) { int i; for(i = 0; i < 10000; i++) { enter_region((int)tid); //Grab the lock sharedVar++; //Increment the shared variable leave_region((int)tid); //Release the lock } pthread_exit(0); } void enter_region(int tid) { interested[tid] = true; //Show this thread is interested turn = tid; //Set flag while(turn == tid && other_interested(tid)); //Busy waiting } bool other_interested(int tid) //interested[] is initialized to all false { int i; for(i = 0; i < tNumber; i++) if(i != tid) if(interested[i] == true) //There are other threads that are interested return true; return false; } void leave_region(int tid) { interested[tid] = false; //Depart from critical region } 

I also repeated the process of creating and starting threads hundreds of times to make sure that runtime can be distinguished. For example, if tNumber is 4, and I repeated the program 1000 times, the mutex will take 2.22 seconds, and the lock will take 1.35 seconds. The difference increases with the number. Why is this happening? Is my code wrong?

+1
source share
2 answers

The code between enter_region and leave_region not leave_region .

You can prove it by making it more complicated to make sure that it fits.

Create a bools (check) array of length 10000, set false. Make a code between input and output:

 if (check[sharedVar]) cout << "ERROR" << endl; else check[sharedVar++] = true; 
0
source

The β€œdifference” in speed is that you synchronize with

 interested[tid] = true; //Show this thread is interested turn = tid; //Set flag while(turn == tid && other_interested(tid)); 

which are sequential operations. Any thread can be unloaded while it is doing this, and the next thread is reading the error state.

This needs to be done atomically using either compare-and-swap or test-and-set . These instructions are usually provided by the hardware.

For example, on x86 you have xchg, cmpxchg/cmpxchg8b, xadd
Your test can be rewritten as

 while( compare_and_swap_atomic(myid,id_meaning_it_is_free) == false); 

The problem is that atomicity is expensive .

0
source

Source: https://habr.com/ru/post/921008/


All Articles