CUDA and fixed (locked page) memory not locked on all pages?

I am trying to find out if CUDA (or the OpenCL implementation) is really telling the truth when I need a fixed (locked page) memory.

I tried cudaMallocHost and looked at the values ​​of /proc/meminfo Mlocked and Unevictable , both remain at 0 and never rise ( /proc/<pid>/status tells VmLck as well as 0). I used mlock for page lock memory, and the values ​​went up as expected.

Thus, there are two possible causes for this behavior:

  • I am not getting page lock memory from the CUDA API, and cudaSuccess is a fake
  • CUDA bypasses OS counters for page-locked memory because CUDA does some magic with the Linux kernel

So, the actual question is: why can't I get values ​​for page lock memory from the OS when I use CUDA to allocate page lock memory?

Optional: Where can I get the correct values, if not from /proc/meminfo or /proc/<pid>/status ?

Thanks!

System: Ubuntu 04/14/01 LTS; CUDA 6.5; Nvidida Driver 340.29; Nvidia Tesla K20c

+5
source share
1 answer

It appears that the fixed distributor on CUDA 6.5 under the hood uses mmap() with MAP_FIXED. Although I am not an expert on the OS, I believe that this will have the effect of β€œfixing” the memory, i.e. Ensuring that his address never changes.

Consider a short test program:

 #include <stdio.h> #define DSIZE (1048576*1024) int main(){ int *data; cudaFree(0); system("cat /proc/meminfo > out1.txt"); printf("*$*before alloc\n"); cudaHostAlloc(&data, DSIZE, cudaHostAllocDefault); printf("*$*after alloc\n"); system("cat /proc/meminfo > out2.txt"); cudaFreeHost(data); system("cat /proc/meminfo > out3.txt"); return 0; } 

If we run this program with strace and take out the output between printf statements, we have:

 write(1, "*$*before alloc\n", 16*$*before alloc) = 16 mmap(0x204500000, 1073741824, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS, 0, 0) = 0x204500000 ioctl(11, 0xc0304627, 0x7fffcf72cce0) = 0 ioctl(3, 0xc0384657, 0x7fffcf72cd70) = 0 write(1, "*$*after alloc\n", 15*$*after alloc) = 15 

(note that 1073741824 is exactly one gigabyte, i.e. the same as the requested 1048576 * 1024)

Reviewing the description of mmap , we have:

Address

gives the preferred starting address for matching. NULL does not express preference. Any previous match at this address is automatically deleted. The address you specify can be changed if you do not use the MAP_FIXED flag.

Therefore, if the mmap command is successful, the requested memory address will be fixed, and therefore, the memory will be "fixed".

This mechanism, apparently, does not use mlock() , therefore mlock's pages do not change before and after. However, we expect a change in the comparison statistics, and if we separate out1.txt and out2.txt created by the above program, we will see (excerpts):

 < Mapped: 87488 kB --- > Mapped: 1135904 kB 

The difference is approximately equal to a gigabyte, the requested amount of "fixed" memory.

+11
source

Source: https://habr.com/ru/post/1206720/


All Articles