You can measure GFLOP by running an algorithm with a large input and measuring the runtime. Then put the runtime and matrix size in this formula. For matrix sizes large enough to support the entire apparatus, FLOPs are weakly dependent on matrix size.
The GPU matrix multiplication algorithm performs the same number of floating point operations as the naive algorithm.
for (i = 0; i < MatrixSize; i++) for (j = 0; j < MatrixSize; j++) for (k = 0; k < MatrixSize; k++) C[j][i] += A[j][k] * B[k][i];
There are 2 floating point operations in the loop body and MatrixSize * MatrixSize * MatrixSize iterations of the loop body that give you the formula for NumOps. GFLOPs are just the number of operations per second divided by 10 ^ 9 ('giga').
source share