2D Pitch Alignment

2D textures are a useful feature of CUDA in image processing applications. To bind the height of the linear memory to two-dimensional textures, the memory must be aligned. cudaMallocPitch is a good option for allocating allocated memory. On my device, the step returned by the cudaMallocPitch is a multiple of 512, that is, the memory is aligned by 512 bytes.

The actual alignment requirement for the device is determined by cudaDeviceProp::texturePitchAlignment , which is 32 bytes on my device.

My question is:

If the actual alignment requirement for 2D textures is 32 bytes, then why does cudaMallocPitch return 512-byte aligned memory?

Isn't that a waste of memory? For example, if I create an 8-bit image 513 x 100 in size, it will occupy 1024 x 100 bytes.

I get this behavior on the following systems:

1: Asus G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 + Core i7 740QM + 4 GB RAM

2: Dell Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6 GB RAM

+4
source share
2 answers

This is a bit speculative answer, but keep in mind that there are two alignment properties that the distribution step should satisfy for textures, one for the textutr pointer and one for the texture lines. I suspect cudaMallocPitch is executing the first defined cudaDeviceProp::textureAlignment . For instance:

 #include <cstdio> int main(void) { const int ncases = 12; const size_t widths[ncases] = { 5, 10, 20, 50, 70, 90, 100, 200, 500, 700, 900, 1000 }; const size_t height = 10; float *vals[ncases]; size_t pitches[ncases]; struct cudaDeviceProp p; cudaGetDeviceProperties(&p, 0); fprintf(stdout, "Texture alignment = %zd bytes\n", p.textureAlignment); cudaSetDevice(0); cudaFree(0); // establish context for(int i=0; i<ncases; i++) { cudaMallocPitch((void **)&vals[i], &pitches[i], widths[i], height); fprintf(stdout, "width = %zd <=> pitch = %zd \n", widths[i], pitches[i]); } return 0; } 

which gives the following on the GT320M:

 Texture alignment = 256 bytes width = 5 <=> pitch = 256 width = 10 <=> pitch = 256 width = 20 <=> pitch = 256 width = 50 <=> pitch = 256 width = 70 <=> pitch = 256 width = 90 <=> pitch = 256 width = 100 <=> pitch = 256 width = 200 <=> pitch = 256 width = 500 <=> pitch = 512 width = 700 <=> pitch = 768 width = 900 <=> pitch = 1024 width = 1000 <=> pitch = 1024 

I assume cudaDeviceProp::texturePitchAlignment applies to arrays.

+3
source

After some experiments with memory allocation, finally I found a working solution that saves memory. If I force the alignment of memory allocated by cudaMalloc , cudaBindTexture2D works fine.

 cudaError_t alignedMalloc2D(void** ptr, int width, int height, int* pitch, int alignment = 32) { if((width% alignment) != 0) width+= (alignment - (width % alignment)); (*pitch) = width; return cudaMalloc(ptr,width* height); } 

The memory allocated by this function is 32-byte aligned, which is a requirement of cudaBindTexture2D . Memory usage is now reduced by 16 times, and all CUDA features that use 2D textures also work correctly.

Here is a small utility function to get the current pitch adjustment value of the CUDA device.

 int getCurrentDeviceTexturePitchAlignment() { cudaDeviceProp prop; int currentDevice = 0; cudaGetDevice(&currentDevice); cudaGetDeviceProperties(&prop,currentDevice); return prop.texturePitchAlignment; } 
+2
source

Source: https://habr.com/ru/post/1435731/


All Articles