Why are math libraries often compared to FLOPS?

Math libraries are often compared based on FLOPS. What information is passed to me when they show me the FLOPS graph and size with point sets for several different math libraries?

FLOPS as an indicator of performance will make more sense to me if there were a comparison between two implementations of the same algorithm or between the same software on two different hardware. I don’t understand why this is a suitable or popular way to compare things like matrix matrix.

Is it just an implication that the basic algorithms are almost the same, and the code that passes the units with a floating point is the fastest, minimizing the gain over your head?

Examples abound.

http://eigen.tuxfamily.org/index.php?title=Benchmark

https://code.google.com/p/blaze-lib/wiki/Benchmarks

https://software.intel.com/en-us/articles/a-simple-example-to-measure-the-performance-of-an-intel-mkl-function

On the other hand, these LAPACK and Armadillo tests use absolute time for this operation, which makes more sense to me.

http://www.netlib.org/lapack/lug/node71.html

http://arma.sourceforge.net/speed.html

Relevant

What is FLOP / s and is this a good measure of performance?

+6
source share
2 answers

People usually compare math libraries to choose the one that minimizes the runtime of their program. For such tests, two things need to be considered: the performance of the libraries at a given input and if that input is representative of your use case.

If we assume that each task (for example, for vector scaling) requires the same number of floating-point operations, then we can expect that work with FLOPS itself will be completed first.

It is assumed that in some cases, each library will perform the same number of floating point operations. But it is quite possible that two libraries will require a different number of floating point operations for one task (for example, matrix matrix multiplication). If so, the library can do less FLOPS, but ends in less time than the library that does more FLOPS. Therefore, in these cases, the total duration of the work is reasonable to watch. If authors publish comparisons in FLOPS, it means that they believe that each library does the same amount of operations in general; or they simply divide the number of operations that are required for the theoretical completion of the task by the total duration of the execution (which is also general). You want to check if there is a reference methodology.

The goal of comparing performance (such as FLOPS) and size is to help people understand performance on representative input for their use. If you know that you will have many small vectors, for example less than 10, then you do not care how fast the library is designed for 1 GB vectors and does not want these inputs to affect the comparison.

Typically, counting FLOPS was popular (perhaps in part because it is easy to explain to mathematicians). I believe that one motivation is that "you can sell a vector of size = 10 per 10,000 FLOPS, but a vector of size = 100 per 100 FLOPS" is easier to digest than saying "you can scale a size = 10 vector in 0.001 seconds, but size = 100 vector in 1 second. " If you report a total runtime, you probably want to scale the size of the input for comparison.

+2
source

In high-performance computing, one goal often is to make the most of your equipment in the shortest possible time. This minimizes the time spent (by people or other time-sensitive systems) waiting for results. In large computing facilities, operating costs (power consumption, labor for maintenance, etc.) are often - approximately - constant over time, so the time for calculations is transferred directly to the bottom line (money paid for calculations).

FLOPS provides an estimate of how much the algorithm uses the processor. The FLOPS measurement for the algorithm, divided by the number of FLOPS that the processor is able to give a fraction between 0 and 1. The closer to 1, the more efficient the algorithm uses the CPU, which translates in bang for buck to this type of CPU (i.e., the algorithm gives a faster solution , therefore, the cost is less).

The result is specific to the CPU (instruction set) and algorithm. But if the algorithm gives a small result on a specific processor, it does not use this CPU. This can lead to a choice of different algorithms, different compilation options (for example, to optimize differently or to select different instructions), to select a server farm on which the algorithm will work more efficiently, etc. For large computations that are repeated (every day), the economic benefit can be great for using an algorithm that uses the processor efficiently against one that uses it inefficiently.

+1
source

Source: https://habr.com/ru/post/987774/


All Articles