I am new to multithreading and try to learn it with a simple program that adds from 1 to n and returns the amount. In the sequential case, main
calls the sumFrom1
function twice for n = 1e5 and 2e5; in multi-threaded cases, two threads are created using pthread_create
, and two sums are computed in a separate thread. The multithreading version is much slower than the sequential version (see Results below). I run this on a platform with 12 processors, and there is no communication between threads.
Multithreaded:
Thread 1 returns: 0 Thread 2 returns: 0 sum of 1..10000: 50005000 sum of 1..20000: 200010000 time: 156 seconds
Sequential:
sum of 1..10000: 50005000 sum of 1..20000: 200010000 time: 56 seconds
When I add -O2 to compilation, the time of the multi-threaded version (9) is less than that of the serial version (11), but not as much as I expect. I always have the -O2 flag, but Iโm interested in learning about the low multithreading speed in an unoptimized case. Should it be slower than the serial version? If not, what can I do to make it faster?
The code:
#include <stdio.h> #include <pthread.h> #include <time.h> typedef struct my_struct { int n; int sum; }my_struct_t; void *sumFrom1(void* sit) { my_struct_t* local_sit = (my_struct_t*) sit; int i; int nsim = 500000; // Loops for consuming time int j; for(j = 0; j < nsim; j++) { local_sit->sum = 0; for(i = 0; i <= local_sit->n; i++) local_sit->sum += i; } } int main(int argc, char *argv[]) { pthread_t thread1; pthread_t thread2; my_struct_t si1; my_struct_t si2; int iret1; int iret2; time_t t1; time_t t2; si1.n = 10000; si2.n = 20000; if(argc == 2 && atoi(argv[1]) == 1) // Use "./prog 1" to test the time of multithreaded version { t1 = time(0); iret1 = pthread_create(&thread1, NULL, sumFrom1, (void*)&si1); iret2 = pthread_create(&thread2, NULL, sumFrom1, (void*)&si2); pthread_join(thread1, NULL); pthread_join(thread2, NULL); t2 = time(0); printf("Thread 1 returns: %d\n",iret1); printf("Thread 2 returns: %d\n",iret2); printf("sum of 1..%d: %d\n", si1.n, si1.sum); printf("sum of 1..%d: %d\n", si2.n, si2.sum); printf("time: %d seconds", t2 - t1); } else // Use "./prog" to test the time of sequential version { t1 = time(0); sumFrom1((void*)&si1); sumFrom1((void*)&si2); t2 = time(0); printf("sum of 1..%d: %d\n", si1.n, si1.sum); printf("sum of 1..%d: %d\n", si2.n, si2.sum); printf("time: %d seconds", t2 - t1); } return 0; }
Update1:
After a little search on โfalse communicationโ (thanks, Martin James!), I think this is the main reason. There are (at least) two ways to fix this:
The first way is to insert a buffer zone between the two structures (thanks, @dasblinkenlight):
my_struct_t si1; char memHolder[4096]; my_struct_t si2;
Without -O2, the time is reduced from ~ 156 to ~ 38 s.
The second way is to avoid frequent updates to sit->sum
, which can be implemented using a temporary variable in sumFrom1
(as @Jens Gustedt replied):
for(int sum = 0, j = 0; j < nsim; j++) { sum = 0; for(i = 0; i <= local_sit->n; i++) sum += i; } local_sit->sum = sum;
Without -O2 , it decreases from ~ 156 s to ~ 35 s or ~ 109 from time to time (it has two peaks! I donโt know why.). At -O2, the time remains ~ 8 s.