I see that nvprof can profile the number of flops in the kernel (using parameters as shown below). Also, when I look through the documentation (here http://docs.nvidia.com/cuda ... he says that flop_count_sp is the number of single-precision floating-point operations performed (addition, multiplication, multiplication and accumulation). Each operation multiplication with accumulation contributes 2 to the score. "
However, when I run the result flop_count_sp(which should be flop_count_sp_add+ flop_count_sp_mul+ flop_count_sp_special+ 2 * flop_count_sp_fma), I find that it does not include the value in the summation flop_count_sp_special.
Could you suggest me what should I use? Should I add this value to the amount flop_count_spor should I consider a formula that does not include the value flop_count_sp_special?
Also, could you tell me what these special operations are?
I use the following command line:
nvprof --metrics flops_sp --metrics flops_sp_add --metrics flops_sp_mul --metrics flops_sp_fma --metrics flops_sp_special myKernel args
Where myKernelis the name of my CUDA kernel, which has some input arguments given by args.
For example, a section of my nvprof outputs is shown below:
==20549== Profiling result:
==20549== Metric result:
Invocations Metric Name Metric Description Min Max Avg
Device "Tesla K40c (0)"
Kernel: mykernel(float*, int, int, float*, int, float*, int*)
2 flop_count_sp Floating Point Operations(Single Precisi 70888 70888 70888
2 flop_count_sp_add Floating Point Operations(Single Precisi 14465 14465 14465
2 flop_count_sp_mul Floating Point Operation(Single Precisio 14465 14465 14465
2 flop_count_sp_fma Floating Point Operations(Single Precisi 20979 20979 20979
2 flop_count_sp_special Floating Point Operations(Single Precisi 87637 87637 87637
source
share