Where is the point at which adding extra cores or processors does not improve performance at all?

* Adding a second core or processor may increase the performance of your parallel program, but this is unlikely to double it. Similarly, a quad-core machine isn't going to run your parallel four program as fast - partly due to the overhead and coordination described in the previous sections. However, the design of computer equipment also limits its ability to scale. You can expect a significant performance improvement, but it will not be 100% for an additional core, and there will almost certainly be a point that adding additional cores or processors does not improve performance at all.

*

I read the paragraph above from the book. But I did not receive the last offer. So, where is the point at which adding extra cores or processors does not improve performance at all?

+4
source share
6 answers

If you take a serial program and a parallel version of the same program, then the parallel program must perform some operations that are not performed by the sequential program, in particular operations related to coordinating the operations of several processors. They contribute to what is often called "parallel overhead" - the extra work that a parallel program must do. This is one of the factors that makes it difficult to get 2x acceleration on two processors, 4x on 4 or 32000x on 32000 processors.

If you study the parallel program code, you will often find segments that are serial and that use only one processor, and the rest are idle. There are some (fragments) of algorithms that are not parallel, and there are some operations that often do not parallelize, but which can be: input-output operations, for example, to parallelize them, you need some kind of parallel input-output system. This "serial fraction" provides the minimum time for your calculations. Amdahl Law explains this, and this article provides a useful starting point for your further reading.

Even if you have a program that parallelizes well, scaling (i.e., changing the speed with increasing number of processors) is not equal to 1. For most parallel programs, the size of parallel overheads (or the sum of the processor time, which is devoted to operations that are necessary only for parallel computing) increases as a function of the number of processors. This often means that adding processors adds parallel overhead, and at some point in scaling your program and jobs, an increase in overhead cancels (or even vice versa) an increase in processor power. The Amdahl Law article also covers Gustafson's Law, which is relevant here.

I formulated this all in very general terms, not taking into account the current processor and computer architecture; what I am describing is the parallel computing functions (as currently understood) of not any particular program or computer.

I do not agree with the statement of @ Daniel Pittman that these questions are purely theoretical. Some of us are working very hard to scale our programs to a very large number of processors (1000). And almost all desktop and office developments today, as well as most mobile developments, are aimed at multiprocessor systems and the use of all these cores is a serious problem.

Finally, to answer your question, at what point does adding processors no longer increase execution speed, now this is a question depending on the architecture and program. Fortunately, this lends itself to empirical research. Finding out the scalability of parallel programs and identifying ways to improve it is a growing niche in the software development profession.

+5
source

High performance sign on the right. This happens when you try to solve a fixed size problem as quickly as possible, so Amdahl’s law applies. This does not happen (usually) when you try to solve a problem at a set time. In the first case, you are ready to use the same amount of time to solve the problem.

  • the size of which is larger;
  • whose size is exactly the same as before, but with high accuracy.

In this situation, Gustafson’s law applies.

So, back to the problems with fixed size. In the acceleration formula, you can distinguish between these components:

  • Sequential sequential calculations: σ (n)
  • Potentially parallel computing: φ (n)
  • Overhead (communication operations, etc.): κ (n, p)

and acceleration for p-processors for problem size n -

enter image description here

Adding processors reduces the calculation time, but increases the communication time (for message transfer algorithms, increases the cost of synchronization, etc. for the algorithm with shared memory); if we continue to add more processors, at some point the increase in communication time will be more than the corresponding decrease in calculation time.

When this happens, the parallel execution time starts to increase. The acceleration is inversely proportional to the run time, so its curve begins to decline. For any fixed problem size, there is an optimal number of processors that minimize the total parallel execution time.

Here's how you can accurately calculate (analytical solution in a closed form), in which you do not get benefits by adding additional processors (or cores, if you want). enter image description here

enter image description here

enter image description here

+2
source

The answer, of course, is “it depends”, but in the current world of multiprocessor systems with shared memory, the short version is “when coordinated shared memory or other traffic resources consumes all available bus bandwidth and / or processor time”.

This is a very theoretical problem. Almost nothing scales well enough to continue to use more cores in small quantities. Few applications benefit from 4, fewer from 8 and almost no 64 cores today - well below theoretical performance limits.

+1
source

If we say x86, the architecture is more or less limited. @ 3 GHz, electricity moves 10 cm (actually slightly less) per Hz, the matrix is ​​about 1 cm square, components should be able to switch states in this single Hz (1/3000000000 second). The current manufacturing process (22 nm) gives interconnects that are 88 (silicone) atoms wide (I may have misunderstood this). With this in mind, you understand that there is not much that can be done with physics (how narrow can a compound be? 10 atoms? 20?). At the other end, the manufacturer, in order to be able to sell the device as "higher performance" than its predecessor, adds a core that theoretically doubles the processing power.

"Theoretically" is actually not quite true. Some specially written applications subdivide a big problem into parts that are small enough to be contained within a single core and its exclusive caches (L1 and L2). The part is given to the kernel, and it is processed for a considerable period of time without accessing the L3 cache or RAM (which it shares with other kernels, and therefore a collision / bottleneck will occur). Upon completion, he writes his results to RAM and receives a new part of the problem for work.

If the core spends 99% of its time in internal processing and 1% of reading and writing to shared memory (L3 cache and RAM), you may have 99 more cores that do the same, because in the end The limiting factor will be the amount of shared memory access available. Given my 99: 1 example, such an application could efficiently use 100 cores.

With more common programs - office, i.e. etc. - Additional processing power is unlikely to be noticed. Some parts of programs may have smaller parts written to use multiple cores, and if you know which ones you may notice that these parts of the programs run much faster.

3 GHz was used as an example, since it works well at a speed of light of 300,000,000 m / s. I recently read that the latest AMD architecture was able to run at 5 GHz, but it was with special coolers, and even then it was slower (processed less) than the Intel i7, which runs at a much slower frequency.

+1
source

To a large extent, this depends on your architecture / program design. Adding cores improves parallel processing. If your program does nothing in parallel, but only sequentially, adding kernels will not improve its performance at all. This can improve other things, although the internal processing of the framework (if you use the framework).

Thus, the more parallel processing allowed in your program, the better it scales with a large number of cores. But if your program has restrictions on parallel processing (by the design or nature of the data), it will not scale indefinitely. It takes a lot of effort to run the program on hundreds of cores, mainly due to increased overhead, resource blocking, and necessary data coordination. The most powerful supercomputers are really massively multi-core, but writing a program that can use them is a significant effort, and they can only demonstrate their strength in inherently parallel tasks.

+1
source

You may find the following article helpful: Hadi Esmaeilzadeh, et.al, "Dark Silicon and the End of Multicore Scaling", at ISCA11, June 4-8, 2011, San Jose, California, USA.

+1
source

Source: https://habr.com/ru/post/1402587/


All Articles