Multiply and Add Functions

This question is about the crazy features available in OpenCL that promise significant improvements for type computations:

a * b + c

if used as crazy (a, b, c) and compiled with cl-mad-enable.

I tried to calculate the shape of a + b * c + d * e, using insanities for very large sizes and expecting a significant improvement. Surprisingly, it took the same time.

If anyone has experience with this, I would appreciate understanding. I have the impetus that it should work, because most of the resources are full of praise for mad (). Note. The types of data that I use are all duplicated, and if it is important, my use of the insane has led to a huge loss of accuracy.

+4
source share
1 answer

(1) There is a big difference between the ability to handle doubles and the ability to efficiently process double precision. The most recent GPUs work twice, but about 2X-4X slower than a single.

However, AFAIK all GPUs that process double have madd instructions. AMD documents this - for example, see http://developer.amd.com/gpu_assets/r600isa.pdf from the 2008 team MULADD_64. I saw less detailed documentation for Nvidia, but for example, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Floating_Point_on_NVIDIA_GPU_White_Paper.pdf says that Nvidia has FMA (Fused Multiply Add) . The Intel GPU manuals at www.x.org/docs/intel do not mention double precision (at least not for Google).

(2) However, probably the main reason you don't see the difference when using madd () is because the compiler already recognizes that madd can be used.

On some GPUs, you can see the generated code. For instance. AMD CodeAnalyst or ShaderAnalyzer, http://developer.amd.com/tools/shader/Pages/default.aspx for OpenGL code.

I spent a lot of time searching for the code generated using these tools, and IIRC was optimized. TBD: show an example here.

+2
source

Source: https://habr.com/ru/post/1398151/


All Articles