Lower than expected acceleration when using multithreading

Note: I'm a little silly about this, but it might help someone

So, I am trying to improve program performance using parallelism. However, I ran into a problem with measured acceleration. I have 4 processors:

~% lscpu
...
CPU(s):                4
...

However, the acceleration is much lower than four times. The following is a minimal working example with a serial version, a version using OpenMP and a version using POSIX streams (to make sure this is not due to an implementation).

Purely Serial ( add_seq.c):

#include <stddef.h>

int main() {
    for (size_t i = 0; i < (1ull<<36); i += 1) {
        __asm__("add $0x42, %%eax" : : : "eax");
    }
    return 0;
}

OpenMP ( add_omp.c):

#include <stddef.h>

int main() {
    #pragma omp parallel for schedule(static)
    for (size_t i = 0; i < (1ull<<36); i += 1) {
        __asm__("add $0x42, %%eax" : : : "eax");
    }
    return 0;
}

POSIX threads ( add_pthread.c):

#include <pthread.h>
#include <stddef.h>

void* f(void* x) {
    (void) x;
    const size_t count = (1ull<<36) / 4;
    for (size_t i = 0; i < count; i += 1) {
        __asm__("add $0x42, %%eax" : : : "eax");
    }
    return NULL;
}
int main() {
    pthread_t t[4];
    for (size_t i = 0; i < 4; i += 1) {
        pthread_create(&t[i], NULL, f, NULL);
    }
    for (size_t i = 0; i < 4; i += 1) {
        pthread_join(t[i], NULL);
    }
    return 0;
}

Makefile:

CFLAGS := -O3 -fopenmp
LDFLAGS := -O3 -lpthread  # just to be sure

all: add_seq add_omp add_pthread

So now, by running this (using zsh built-in time):

% make -B && time ./add_seq && time ./add_omp && time ./add_pthread
cc -O3 -fopenmp  -O3 -lpthread    add_seq.c   -o add_seq
cc -O3 -fopenmp  -O3 -lpthread    add_omp.c   -o add_omp
cc -O3 -fopenmp  -O3 -lpthread    add_pthread.c   -o add_pthread
./add_seq  24.49s user 0.00s system 99% cpu 24.494 total
./add_omp  52.97s user 0.00s system 398% cpu 13.279 total
./add_pthread  52.92s user 0.00s system 398% cpu 13.266 total

, 2,90 , ( ) 2,60 . , :

>>> 24.494 * 2.9
71.0326
>>> 13.279 * 2.6
34.5254
>>> 13.266 * 2.6
34.4916

, , , , . ?

: asm_omp.c , for, , ZF;

+4
1

, : :

% lscpu
...
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
...

, htop , - hyperthreading. , ( ).

, , , /clock() , . 100% - , ~ 400% , .

, 4 .

+3

Source: https://habr.com/ru/post/1654763/


All Articles