A surface reference is faster than a surface object

I recently changed the surface reference of my algorithm for a surface object. Then I noticed that the program runs slower.

Here is a comparison for a simple example when I populate a 3D floating array [400 * 400 * 400] with a constant value.

Surface API

Time: 9.068928 ms

surface<void, cudaSurfaceType3D> s_volumeSurf; ... surf3Dwrite(value, s_volumeSurf, px*sizeof(float), py, pz, cudaBoundaryModeTrap); 

Surface Feature API

Time: 14.960256 ms

 cudaSurfaceObject_t l_volSurfObj; ... surf3Dwrite(value, l_volSurfObj, px*sizeof(float), py, pz, cudaBoundaryModeTrap); 

This has been tested on the GTX 680 with Compute Capability 3.0 and CUDA 5.0.

Does anyone have an explanation for this difference?

+4
source share
1 answer

In the case of a surface object, surface descriptors are retrieved from global memory. In a superficial reference example, these descriptors are compiled into read-only memory. Getting these descriptors can be much faster than accessing global memory. If your kernel is small enough or the L1 cache is disabled, you can observe a significant performance loss.

You can split the SASS code to see the difference.

+5
source

Source: https://habr.com/ru/post/1482911/


All Articles