Java multithreaded threads provide very low performance

I wanted to learn parallel programming to speed up algorithms and choose Java.
I wrote two functions for summing the long integers in an array - one simple iterative array, the second - dividing the array into parts and summing the parts in separate streams.

I expected the logic to be about twice as high using two threads. However, I only have 24% speed. Moreover, using more threads, I do not get an improvement (possibly less than 1%) for two threads. I know that there should be thread creation / overhead pooling, but I think it shouldn't be that big.

Could you explain what I am missing or where is the error in the code? Here is the code:

 import java.util.concurrent.ThreadLocalRandom; public class ParallelTest { public static long sum1 (long[] num, int a, int b) { long r = 0; while (a < b) { r += num[a]; ++a; } return r; } public static class SumThread extends Thread { private long num[]; private long r; private int a, b; public SumThread (long[] num, int a, int b) { super(); this.num = num; this.a = a; this.b = b; } @Override public void run () { r = ParallelTest.sum1(num, a, b); } public long getSum () { return r; } } public static long sum2 (long[] num, int a, int b, int threadCnt) throws InterruptedException { SumThread[] th = new SumThread[threadCnt]; int i = 0, c = (b - a + threadCnt - 1) / threadCnt; for (;;) { int a2 = a + c; if (a2 > b) { a2 = b; } th[i] = new SumThread(num, a, a2); th[i].start(); if (a2 == b) { break; } a = a2; ++i; } for (i = 0; i < threadCnt; ++i) { th[i].join(); } long r = 0; for (i = 0; i < threadCnt; ++i) { r += th[i].getSum(); } return r; } public static void main(String[] args) throws InterruptedException { final int N = 230000000; long[] num = new long[N]; for (int i = 0; i < N; ++i) { num[i] = ThreadLocalRandom.current().nextLong(1, 9999); } // System.out.println(Runtime.getRuntime().availableProcessors()); long timestamp = System.nanoTime(); System.out.println(sum1(num, 0, num.length)); System.out.println(System.nanoTime() - timestamp); for (int n = 2; n <= 4; ++n) { timestamp = System.nanoTime(); System.out.println(sum2(num, 0, num.length, n)); System.out.println(System.nanoTime() - timestamp); } } } 

EDIT: I have an i7 processor with 4 cores (8 threads). The output given by the code is:

 1149914787860 175689196 1149914787860 149224086 1149914787860 147709988 1149914787860 138243999 
+5
source share
2 answers

The program is probably the main memory bandwidth, limited to only two threads, because it is a small cycle that retrieves data almost as fast as RAM can transfer data to the processor.

+3
source

I can think of several reasons why you might not get as many accelerations as you expect.

  • The overhead of creating threads is substantial. Thread start() is an expensive operation that involves several system calls to allocate a thread stack and its red zone and then create your own thread.

  • Threads N will not start at the same time. This means that the time to complete the parallel part of the calculation will be approximately the final time of the last thread - the beginning of the first time. It will be longer than the time when one thread takes over part of its work. (N-1 times the thread creation time ...)

  • Threads N (basically) perform sequential scans of N disjoint sections of the array. This is the bandwidth of the memory, and the scanning method means that the memory caches will be inefficient. Thus, there is a good chance that performance is limited by the speed and throughput of your main memory system equipment.

+3
source

Source: https://habr.com/ru/post/1261107/


All Articles