It appears that the fixed distributor on CUDA 6.5 under the hood uses mmap() with MAP_FIXED. Although I am not an expert on the OS, I believe that this will have the effect of βfixingβ the memory, i.e. Ensuring that his address never changes.
Consider a short test program:
#include <stdio.h> #define DSIZE (1048576*1024) int main(){ int *data; cudaFree(0); system("cat /proc/meminfo > out1.txt"); printf("*$*before alloc\n"); cudaHostAlloc(&data, DSIZE, cudaHostAllocDefault); printf("*$*after alloc\n"); system("cat /proc/meminfo > out2.txt"); cudaFreeHost(data); system("cat /proc/meminfo > out3.txt"); return 0; }
If we run this program with strace and take out the output between printf statements, we have:
write(1, "*$*before alloc\n", 16*$*before alloc) = 16 mmap(0x204500000, 1073741824, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x204500000 ioctl(11, 0xc0304627, 0x7fffcf72cce0) = 0 ioctl(3, 0xc0384657, 0x7fffcf72cd70) = 0 write(1, "*$*after alloc\n", 15*$*after alloc) = 15
(note that 1073741824 is exactly one gigabyte, i.e. the same as the requested 1048576 * 1024)
Reviewing the description of mmap , we have:
Addressgives the preferred starting address for matching. NULL does not express preference. Any previous match at this address is automatically deleted. The address you specify can be changed if you do not use the MAP_FIXED flag.
Therefore, if the mmap command is successful, the requested memory address will be fixed, and therefore, the memory will be "fixed".
This mechanism, apparently, does not use mlock() , therefore mlock's pages do not change before and after. However, we expect a change in the comparison statistics, and if we separate out1.txt and out2.txt created by the above program, we will see (excerpts):
< Mapped: 87488 kB --- > Mapped: 1135904 kB
The difference is approximately equal to a gigabyte, the requested amount of "fixed" memory.