Why in my case is multithreading slower than sequential programming?

I am new to multithreading and try to learn it with a simple program that adds from 1 to n and returns the amount. In the sequential case, main calls the sumFrom1 function twice for n = 1e5 and 2e5; in multi-threaded cases, two threads are created using pthread_create , and two sums are computed in a separate thread. The multithreading version is much slower than the sequential version (see Results below). I run this on a platform with 12 processors, and there is no communication between threads.

Multithreaded:

 Thread 1 returns: 0 Thread 2 returns: 0 sum of 1..10000: 50005000 sum of 1..20000: 200010000 time: 156 seconds 

Sequential:

 sum of 1..10000: 50005000 sum of 1..20000: 200010000 time: 56 seconds 

When I add -O2 to compilation, the time of the multi-threaded version (9) is less than that of the serial version (11), but not as much as I expect. I always have the -O2 flag, but Iโ€™m interested in learning about the low multithreading speed in an unoptimized case. Should it be slower than the serial version? If not, what can I do to make it faster?

The code:

 #include <stdio.h> #include <pthread.h> #include <time.h> typedef struct my_struct { int n; int sum; }my_struct_t; void *sumFrom1(void* sit) { my_struct_t* local_sit = (my_struct_t*) sit; int i; int nsim = 500000; // Loops for consuming time int j; for(j = 0; j < nsim; j++) { local_sit->sum = 0; for(i = 0; i <= local_sit->n; i++) local_sit->sum += i; } } int main(int argc, char *argv[]) { pthread_t thread1; pthread_t thread2; my_struct_t si1; my_struct_t si2; int iret1; int iret2; time_t t1; time_t t2; si1.n = 10000; si2.n = 20000; if(argc == 2 && atoi(argv[1]) == 1) // Use "./prog 1" to test the time of multithreaded version { t1 = time(0); iret1 = pthread_create(&thread1, NULL, sumFrom1, (void*)&si1); iret2 = pthread_create(&thread2, NULL, sumFrom1, (void*)&si2); pthread_join(thread1, NULL); pthread_join(thread2, NULL); t2 = time(0); printf("Thread 1 returns: %d\n",iret1); printf("Thread 2 returns: %d\n",iret2); printf("sum of 1..%d: %d\n", si1.n, si1.sum); printf("sum of 1..%d: %d\n", si2.n, si2.sum); printf("time: %d seconds", t2 - t1); } else // Use "./prog" to test the time of sequential version { t1 = time(0); sumFrom1((void*)&si1); sumFrom1((void*)&si2); t2 = time(0); printf("sum of 1..%d: %d\n", si1.n, si1.sum); printf("sum of 1..%d: %d\n", si2.n, si2.sum); printf("time: %d seconds", t2 - t1); } return 0; } 

Update1:

After a little search on โ€œfalse communicationโ€ (thanks, Martin James!), I think this is the main reason. There are (at least) two ways to fix this:

The first way is to insert a buffer zone between the two structures (thanks, @dasblinkenlight):

 my_struct_t si1; char memHolder[4096]; my_struct_t si2; 

Without -O2, the time is reduced from ~ 156 to ~ 38 s.

The second way is to avoid frequent updates to sit->sum , which can be implemented using a temporary variable in sumFrom1 (as @Jens Gustedt replied):

 for(int sum = 0, j = 0; j < nsim; j++) { sum = 0; for(i = 0; i <= local_sit->n; i++) sum += i; } local_sit->sum = sum; 

Without -O2 , it decreases from ~ 156 s to ~ 35 s or ~ 109 from time to time (it has two peaks! I donโ€™t know why.). At -O2, the time remains ~ 8 s.

+6
source share
1 answer

Changing the code to

 typedef struct my_struct { size_t n; size_t sum; }my_struct_t; void *sumFrom1(void* sit) { my_struct_t* local_sit = sit; size_t nsim = 500000; // Loops for consuming time size_t n = local_sit->n; size_t sum = 0; for(size_t j = 0; j < nsim; j++) { for(size_t i = 0; i <= n; i++) sum += i; } local_sit->sum = sum; return 0; } 

the phenomenon disappears. The problems you had:

  • using int , because the data type is completely wrong for such a test. Your numbers, where such an amount is full. Overflow of signed types is undefined behavior. You are lucky that he does not eat your lunch.
  • with restrictions and summing variables with indirectness, additional loads and stores buy you, which in the case of -O0 really executed as such, with all the consequences of a false exchange and the like.

Other errors have also been detected in your code:

  • Missing include for atoi
  • super-threaded cast in and out of void*
  • print time_t as int

Please compile your code with -Wall before publishing.

+3
source

Source: https://habr.com/ru/post/912916/


All Articles