Linux over commit heuristic

The over commit article from the kernel document only mentions that over commit mode 0 is based on heuristic transaction processing. It does not describe heuristics.

Can someone shed light on what real heuristics are? Any relevant link to kernel sources also works!

Thank you in advance

+1
source share
1 answer

Actually, the kernel documentation for accounting for rebuilding has some details: https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

The Linux kernel supports the following overcommit processing modes.

0 - Heuristic processing with excess.

Refusal of redundant address space addresses. Used for a typical system. This ensures that serious wild distribution fails, overcommit to reduce swap usage. root allows you to allocate a bit more memory in this mode. This is the default value.

Also Documentation / sysctl / vm.txt

overcommit_memory: This value contains a flag that allows memory re-arrangement.
When this flag is 0, the kernel tries to estimate how much free memory remains when user space requests more memory ...

See the documentation / vm / overcommit -accounting and mm / mmap.c :: __ vm_enough_memory () for more information.

In addition, man 5 proc :

/proc/sys/vm/overcommit_memory This file contains the kernel virtual memory accounting mode. Values:

  0: heuristic overcommit (this is the default) 1: always overcommit, never check 2: always check, never overcommit 

In mode 0, calls to mmap(2) with MAP_NORESERVE not checked, and the default check is very weak, which leads to the risk of an OOM-kill process.

Thus, very large distributions are disabled by heuristics, but sometimes an application can allocate more virtual memory than the size of physical memory in the system if it does not use all of this. With a value of MAP_NORESERVE amount of mmapable memory may be higher.

Parameter: "The overcommit policy is set through sysctl` vm.overcommit_memory", so we can find how it is implemented in the source code: http://lxr.free-electrons.com/ident?v=4.4;i=sysctl_overcommit_memory , defined in line 112 mm / mmap.c

  112 int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS; /* heuristic overcommit */ 

and the constant OVERCOMMIT_GUESS (defined in linux / mman.h ) is actually used only in the line 170 mm / mmap.c , this is the implementation of the heuristic:

 138 /* 139 * Check that a process has enough memory to allocate a new virtual 140 * mapping. 0 means there is enough memory for the allocation to 141 * succeed and -ENOMEM implies there is not. 142 * 143 * We currently support three overcommit policies, which are set via the 144 * vm.overcommit_memory sysctl. See Documentation/vm/overcommit-accounting 145 * 146 * Strict overcommit modes added 2002 Feb 26 by Alan Cox. 147 * Additional code 2002 Jul 20 by Robert Love. 148 * 149 * cap_sys_admin is 1 if the process has admin privileges, 0 otherwise. 150 * 151 * Note this is a helper function intended to be used by LSMs which 152 * wish to use this logic. 153 */ 154 int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) ... 170 if (sysctl_overcommit_memory == OVERCOMMIT_GUESS) { 171 free = global_page_state(NR_FREE_PAGES); 172 free += global_page_state(NR_FILE_PAGES); 173 174 /* 175 * shmem pages shouldn't be counted as free in this 176 * case, they can't be purged, only swapped out, and 177 * that won't affect the overall amount of available 178 * memory in the system. 179 */ 180 free -= global_page_state(NR_SHMEM); 181 182 free += get_nr_swap_pages(); 183 184 /* 185 * Any slabs which are created with the 186 * SLAB_RECLAIM_ACCOUNT flag claim to have contents 187 * which are reclaimable, under pressure. The dentry 188 * cache and most inode caches should fall into this 189 */ 190 free += global_page_state(NR_SLAB_RECLAIMABLE); 191 192 /* 193 * Leave reserved pages. The pages are not for anonymous pages. 194 */ 195 if (free <= totalreserve_pages) 196 goto error; 197 else 198 free -= totalreserve_pages; 199 200 /* 201 * Reserve some for root 202 */ 203 if (!cap_sys_admin) 204 free -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 10); 205 206 if (free > pages) 207 return 0; 208 209 goto error; 210 } 

So, heuristic is a way to estimate how many pages of physical memory are being used now ( free ) when a request for more memory is being processed (applications request pages pages).

When overcommit ("1") is always enabled, this function always returns 0 ("there is enough memory for this request")

 164 /* 165 * Sometimes we want to use more memory than we have 166 */ 167 if (sysctl_overcommit_memory == OVERCOMMIT_ALWAYS) 168 return 0; 

Without the default heuristic in "2" mode, the kernel will try to take into account the requested pages pages to get a new Committed_AS (from /proc/meminfo ):

 162 vm_acct_memory(pages); ... 

this actually just increments vm_committed_as - __percpu_counter_add(&vm_committed_as, pages, vm_committed_as_batch);

 212 allowed = vm_commit_limit(); 

The magic is here:

 401 /* 402 * Committed memory limit enforced when OVERCOMMIT_NEVER policy is used 403 */ 404 unsigned long vm_commit_limit(void) 405 { 406 unsigned long allowed; 407 408 if (sysctl_overcommit_kbytes) 409 allowed = sysctl_overcommit_kbytes >> (PAGE_SHIFT - 10); 410 else 411 allowed = ((totalram_pages - hugetlb_total_pages()) 412 * sysctl_overcommit_ratio / 100); 413 allowed += total_swap_pages; 414 415 return allowed; 416 } 417 

So, allowed set as kilobyte in vm.overcommit_kbytes sysctl or as vm.overcommit_ratio as a percentage of physical memory, plus swap sizes.

 213 /* 214 * Reserve some for root 215 */ 216 if (!cap_sys_admin) 217 allowed -= sysctl_admin_reserve_kbytes >> (PAGE_SHIFT - 10); 

Allow a certain amount of memory for root only (Page_shift - 12 for a healthy person, page_shift-10 is just a conversion from kilobytes to the number of pages).

 218 219 /* 220 * Don't let a single process grow so big a user can't recover 221 */ 222 if (mm) { 223 reserve = sysctl_user_reserve_kbytes >> (PAGE_SHIFT - 10); 224 allowed -= min_t(long, mm->total_vm / 32, reserve); 225 } 226 227 if (percpu_counter_read_positive(&vm_committed_as) < allowed) 228 return 0; 

If, after accounting for the request, the entire user space still has less memory, select it. In the other case, refuse the request (and not take into account the request).

 229 error: 230 vm_unacct_memory(pages); 231 232 return -ENOMEM; 

In other words, as stated in the "Linux Kernel. Some Notes on the Linux Kernel", 2003-02-01 by Andries Brouwer, 9. Memory, 9.6 Overcommit and OOM - https://www.win.tue.nl/~aeb/ linux / lk / lk-9.html :

Going in the right direction

Starting from 2.5.30 values:

  • 0 (default): as before: guess how much a reasonable level has been exceeded,
  • 1 : never give up malloc ()
  • 2 : be precise with respect to overcommit β€” never transfer a virtual address space larger than the swap space, as well as the share of overcommit_ratio physical memory.

So, "2" is an accurate calculation of the amount of memory used after the request, and "0" is a heuristic estimate.

+3
source

Source: https://habr.com/ru/post/1015206/


All Articles