C ++ pthread multithreading for 2 x Intel Xeon X5570 processors, quad-core processors using Amazan EC2 HPC ubuntu as an example

I wrote a program that uses multithreading for parallel computing. I checked that on my system (OS X) it simultaneously allocates both cores. I just ported it to Ubuntu without any changes, because I encoded it based on this platform. In particular, I run the Canonical OneVic HVM image on Amazon EC2, the cluster computes a 4x large instance. These machines are equipped with 2 Intel Xeon X5570 processors, quad-core processors.

Unfortunately, my program does not perform multithreading on an EC2 machine. Performing more than 1 thread actually slows down the calculation for each additional thread. When I start my program, it is shown that when initializing more than 1 thread, the system% of CPU consumption is approximately proportional to the number of threads. Only 1 stream,% sy ~ 0.1. In any case, the user% never exceeds ~ 9%.

Below are sections related to my code stream

const int NUM_THREADS = N; //where changing N is how I set the # of threads void Threading::Setup_Threading() { sem_unlink("producer_gate"); sem_unlink("consumer_gate"); producer_gate = sem_open("producer_gate", O_CREAT, 0700, 0); consumer_gate = sem_open("consumer_gate", O_CREAT, 0700, 0); completed = 0; queued = 0; pthread_attr_init (&attr); pthread_attr_setdetachstate (&attr, PTHREAD_CREATE_DETACHED); } void Threading::Init_Threads(vector <NetClass> * p_Pop) { thread_list.assign(NUM_THREADS, pthread_t()); for(int q=0; q<NUM_THREADS; q++) pthread_create(&thread_list[q], &attr, Consumer, (void*) p_Pop ); } void* Consumer(void* argument) { std::vector <NetClass>* p_v_Pop = (std::vector <NetClass>*) argument ; while(1) { sem_wait(consumer_gate); pthread_mutex_lock (&access_queued); int index = queued; queued--; pthread_mutex_unlock (&access_queued); Run_Gen( (*p_v_Pop)[index-1] ); completed--; if(!completed) sem_post(producer_gate); } } main() { ... t1 = time(NULL); threads.Init_Threads(p_Pop_m); for(int w = 0; w < MONTC_NUM_TRIALS ; w++) { queued = MONTC_POP; completed = MONTC_POP; for(int q = MONTC_POP-1 ; q > -1; q--) sem_post(consumer_gate); sem_wait(producer_gate); } threads.Close_Threads(); t2 = time(NULL); cout << difftime(t2, t1); ... } 
0
source share
1 answer

Okay, just guess. There is an easy way to convert parallel code to serial. For instance:

 thread_func: while (1) { pthread_mutex_lock(m1); //do something pthread_mutex_unlock(m1); ... pthread_mutex_lock(mN); pthread_mutex_unlock(mN); 

If you run such code in multiple threads, you will not see acceleration due to the use of the mutex. The code will work as serial, but not as parallel. Only one thread will work at any time.

The bad thing is that you cannot use any mutexes in your program, but there is still such a situation. For example, a call to β€œmalloc” can lead to the use of a mutex, where in β€œC” the runtime call to β€œwrite” can lead to the use of a mutex somewhere in the Linux kernel. Even a gettimeofday call can lock / unlock mutexes (and they call if they talk about Linux / glibc).

You can only have one mutex, but spend a lot of time on it, and this can lead to this behavior.

And because of the mutex, it can be used somewhere in the kernel and in the C / C ++ runtime, you can see different behavior with different OSs.

+2
source

Source: https://habr.com/ru/post/1382588/


All Articles