Note: I'm a little silly about this, but it might help someone
So, I am trying to improve program performance using parallelism. However, I ran into a problem with measured acceleration. I have 4 processors:
~% lscpu
...
CPU(s): 4
...
However, the acceleration is much lower than four times. The following is a minimal working example with a serial version, a version using OpenMP and a version using POSIX streams (to make sure this is not due to an implementation).
Purely Serial ( add_seq.c):
#include <stddef.h>
int main() {
for (size_t i = 0; i < (1ull<<36); i += 1) {
__asm__("add $0x42, %%eax" : : : "eax");
}
return 0;
}
OpenMP ( add_omp.c):
#include <stddef.h>
int main() {
#pragma omp parallel for schedule(static)
for (size_t i = 0; i < (1ull<<36); i += 1) {
__asm__("add $0x42, %%eax" : : : "eax");
}
return 0;
}
POSIX threads ( add_pthread.c):
#include <pthread.h>
#include <stddef.h>
void* f(void* x) {
(void) x;
const size_t count = (1ull<<36) / 4;
for (size_t i = 0; i < count; i += 1) {
__asm__("add $0x42, %%eax" : : : "eax");
}
return NULL;
}
int main() {
pthread_t t[4];
for (size_t i = 0; i < 4; i += 1) {
pthread_create(&t[i], NULL, f, NULL);
}
for (size_t i = 0; i < 4; i += 1) {
pthread_join(t[i], NULL);
}
return 0;
}
Makefile:
CFLAGS := -O3 -fopenmp
LDFLAGS := -O3 -lpthread
all: add_seq add_omp add_pthread
So now, by running this (using zsh built-in time):
% make -B && time ./add_seq && time ./add_omp && time ./add_pthread
cc -O3 -fopenmp -O3 -lpthread add_seq.c -o add_seq
cc -O3 -fopenmp -O3 -lpthread add_omp.c -o add_omp
cc -O3 -fopenmp -O3 -lpthread add_pthread.c -o add_pthread
./add_seq 24.49s user 0.00s system 99% cpu 24.494 total
./add_omp 52.97s user 0.00s system 398% cpu 13.279 total
./add_pthread 52.92s user 0.00s system 398% cpu 13.266 total
, 2,90 , ( ) 2,60 . , :
>>> 24.494 * 2.9
71.0326
>>> 13.279 * 2.6
34.5254
>>> 13.266 * 2.6
34.4916
, , , , . ?
: asm_omp.c , for, , ZF;