How to measure gflops of matrix multiplication core?

Question

How to measure gflops of matrix multiplication core?

In the book Programming Massively Parallel Processors number of gflops is used to compare the performance of different matrix multiplication cores. How can I calculate this for my own cores on my machine?

Somewhere in the NVIDIA forums, I found this “algorithm”, but I don’t know how valid it is or where two things happen.

 NumOps = 2 * pow(MatrixSize,3) gflops = 1.0e-9 * NumOps / ExecutionTime

ps feel free to change tags ...

+6

benchmarking gpgpu cuda

Framester Jul 29 '11 at 12:26

source share

1 answer

Heatsink · Accepted Answer · 2011-07-29T14:13:56+0000

You can measure GFLOP by running an algorithm with a large input and measuring the runtime. Then put the runtime and matrix size in this formula. For matrix sizes large enough to support the entire apparatus, FLOPs are weakly dependent on matrix size.

The GPU matrix multiplication algorithm performs the same number of floating point operations as the naive algorithm.

 for (i = 0; i < MatrixSize; i++) for (j = 0; j < MatrixSize; j++) for (k = 0; k < MatrixSize; k++) C[j][i] += A[j][k] * B[k][i];

There are 2 floating point operations in the loop body and MatrixSize * MatrixSize * MatrixSize iterations of the loop body that give you the formula for NumOps. GFLOPs are just the number of operations per second divided by 10 ^ 9 ('giga').

How to measure gflops of matrix multiplication core?

More articles: