A surface reference is faster than a surface object

Question

A surface reference is faster than a surface object

I recently changed the surface reference of my algorithm for a surface object. Then I noticed that the program runs slower.

Here is a comparison for a simple example when I populate a 3D floating array [400 * 400 * 400] with a constant value.

Surface API

Time: 9.068928 ms

surface<void, cudaSurfaceType3D> s_volumeSurf; ... surf3Dwrite(value, s_volumeSurf, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

Surface Feature API

Time: 14.960256 ms

 cudaSurfaceObject_t l_volSurfObj; ... surf3Dwrite(value, l_volSurfObj, px*sizeof(float), py, pz, cudaBoundaryModeTrap);

This has been tested on the GTX 680 with Compute Capability 3.0 and CUDA 5.0.

Does anyone have an explanation for this difference?

+4

cuda

Arnaud May 27 '13 at 7:39

source share

1 answer

longlee · Accepted Answer · 2013-07-16T06:20:13+0000

In the case of a surface object, surface descriptors are retrieved from global memory. In a superficial reference example, these descriptors are compiled into read-only memory. Getting these descriptors can be much faster than accessing global memory. If your kernel is small enough or the L1 cache is disabled, you can observe a significant performance loss.

You can split the SASS code to see the difference.

A surface reference is faster than a surface object

Surface API

Surface Feature API

More articles: