This is a bit speculative answer, but keep in mind that there are two alignment properties that the distribution step should satisfy for textures, one for the textutr pointer and one for the texture lines. I suspect cudaMallocPitch is executing the first defined cudaDeviceProp::textureAlignment . For instance:
#include <cstdio> int main(void) { const int ncases = 12; const size_t widths[ncases] = { 5, 10, 20, 50, 70, 90, 100, 200, 500, 700, 900, 1000 }; const size_t height = 10; float *vals[ncases]; size_t pitches[ncases]; struct cudaDeviceProp p; cudaGetDeviceProperties(&p, 0); fprintf(stdout, "Texture alignment = %zd bytes\n", p.textureAlignment); cudaSetDevice(0); cudaFree(0); // establish context for(int i=0; i<ncases; i++) { cudaMallocPitch((void **)&vals[i], &pitches[i], widths[i], height); fprintf(stdout, "width = %zd <=> pitch = %zd \n", widths[i], pitches[i]); } return 0; }
which gives the following on the GT320M:
Texture alignment = 256 bytes width = 5 <=> pitch = 256 width = 10 <=> pitch = 256 width = 20 <=> pitch = 256 width = 50 <=> pitch = 256 width = 70 <=> pitch = 256 width = 90 <=> pitch = 256 width = 100 <=> pitch = 256 width = 200 <=> pitch = 256 width = 500 <=> pitch = 512 width = 700 <=> pitch = 768 width = 900 <=> pitch = 1024 width = 1000 <=> pitch = 1024
I assume cudaDeviceProp::texturePitchAlignment applies to arrays.
source share