Why is matrix multiplication slower on a 7-core workstation than on my laptop?

I ran the following matlab code:

rng(1) matrix_size = 200; iterations = 100000; A = rand(matrix_size); B = rand(matrix_size); profile on for i = 1:iterations A * B; end profile off 

On my MacAir (Intel (R) Core (TM) i5-4260U CPU @ 1.40 GHz) it takes 39 seconds. On a workstation with 7 cores (Intel (R) Xeon (R) CPU E5-2687W v4 @ 3.00 GHz), this takes 62 s.

I did not specify -singleCompThread . The workstation has 12 cores, but 5 single-threaded processes were performed. I had (almost) 7 cores for myself. They were enlarged all the time.

How can it be?

When you run the above code with -singleCompThread it exits at 54.

+5
source share
1 answer

Mathworks post support team task:

In MATLAB 7.4 (R2007a) MATLAB supports multi-threaded computation for a number of functions and expressions that are combinations of elemental functions (for example, y = 4 * x * (sin (x) + x ^ 3)). These functions are automatically executed on several threads, and you do not need to explicitly specify commands to create threads in your code.

For a function or expression to execute faster (acceleration) on several cores, the following conditions must be met:

1) The operations in the algorithm performed by the function are easily divided into sections that can be performed simultaneously, and with a small connection or several sequential operations. This applies to all elementary operations.

2) The data size is large enough so that any of the benefits of simultaneous execution outweighs the time required to separate the data and manage individual threads of execution. For example, most functions are accelerated only when the array is larger than several thousand elements.

3) The operation is not related to memory, where the processing time is dominated by the access time to the memory, as in the case of simple operations such as adding by elements. Generally, more complex functions are accelerated better than simple functions.

Your business does not fill out 2. or 3. Multiplication happens very quickly and simply and is associated with memory, and your matrices are relatively small. Apparently, multithreading will include more overhead, as seen from your test with -singleCompThread. You can try the test with a larger matrix and see if the difference changes. You can also try the test on a Macbook with -singleCompThread to see if the relative performance of a single thread falls into the expected range.

Another (partial) explanation may be the various vector instructions between Sandy Bridge and Haswell, i.e. AVX2 . I would first do tests before looking at this.

Also note that the Matlab profiler disables JIT. Thus, the results you obtained may not be compatible with what you are doing in comparison with the real case.

+1
source

Source: https://habr.com/ru/post/1265132/


All Articles