Nvidia nvprof outputs for FLOPS

I see that nvprof can profile the number of flops in the kernel (using parameters as shown below). Also, when I look through the documentation (here http://docs.nvidia.com/cuda ... he says that flop_count_sp is the number of single-precision floating-point operations performed (addition, multiplication, multiplication and accumulation). Each operation multiplication with accumulation contributes 2 to the score. "

However, when I run the result flop_count_sp(which should be flop_count_sp_add+ flop_count_sp_mul+ flop_count_sp_special+ 2 * flop_count_sp_fma), I find that it does not include the value in the summation flop_count_sp_special.

Could you suggest me what should I use? Should I add this value to the amount flop_count_spor should I consider a formula that does not include the value flop_count_sp_special?

Also, could you tell me what these special operations are?

I use the following command line:

nvprof --metrics flops_sp --metrics flops_sp_add --metrics flops_sp_mul --metrics flops_sp_fma --metrics flops_sp_special myKernel args

Where myKernelis the name of my CUDA kernel, which has some input arguments given by args.

For example, a section of my nvprof outputs is shown below:

 ==20549== Profiling result:
 ==20549== Metric result:
 Invocations                               Metric Name                        Metric Description         Min         Max         Avg
 Device "Tesla K40c (0)"
    Kernel: mykernel(float*, int, int, float*, int, float*, int*)
           2                             flop_count_sp  Floating Point Operations(Single Precisi       70888       70888       70888
           2                         flop_count_sp_add  Floating Point Operations(Single Precisi       14465       14465       14465
           2                         flop_count_sp_mul  Floating Point Operation(Single Precisio       14465       14465       14465
           2                         flop_count_sp_fma  Floating Point Operations(Single Precisi       20979       20979       20979
           2                     flop_count_sp_special  Floating Point Operations(Single Precisi       87637       87637       87637
+4
source share
1 answer

"" , : , recip sqrt, log, exp, sin, cos. , ( ), , (-use_fast_math).

, , , flop_count_sp. (8.0), , ​​ ( - ).

+6

Source: https://habr.com/ru/post/1678616/


All Articles