What segments are affected by copying to a record?

My understanding of copy-on-write is that "everyone has one, shared copy of the same data before it is written, and then a copy."

  • Is a shared copy of the same data consisting of a heap and a bss segment, or just a heap?
  • What memory segments will be shared, and does it depend on the OS?
+2
memory-management linux operating-system
May 05 '16 at 10:17
source share
2 answers

The OS can set any “copy to write” policy that it wants, but, as a rule, they all do the same (that is, which makes the most sense).

It is clear that for a POSIX-like system (linux, BSD, OSX) there are four areas (that you called segments): data (where int x = 1; goes), bss (where int y goes), sbrk (this is a heap / malloc ) and stack

When fork is executed, the OS sets up a new page map for the child element, which shares all the pages of the parent. Then, on the pages of the parent and child map, all pages are marked as read-only.

Each page map also has a link counter that shows how many processes the page shares. Before the plug, refcount will be 1, and after - 2.

Now, when any process tries to write to the R / O page, it will receive a page error. The OS will see that this is for “copy on write”, create a personal page for the process, copy the data from the public, mark the page as writable for this process and resume it.

It will also reduce the conversion factor. If refcount is now [again] 1, the OS will mark the page in another process as writable and not shared [this eliminates the error of the second page in another process - acceleration only because at the moment the OS knows that another process must be free to write no problem]. This acceleration may be OS dependent.

Actually, the bss section gets even more special treatment. In the original page mapping for it, all pages are displayed on one page containing all zeros (aka "zero page"). The display is marked R / O. Thus, the bss area can be gigabytes in size, and it will occupy only one physical page. This single, special, zero page is shared between all bss sections of all processes, regardless of whether they have any relation to each other at all.

Thus, a process can read from any page in the area and get what it expects: zero. This happens only when the process tries to write to such a page, the same copying to the writing mechanism is started, the process receives a private page, the mapping is configured, and the process resumes. Now it is free to write to the page at its discretion.

Again, the OS can choose its policy. For example, after a fork, it may be more efficient to split most of the pages of the stack, but start with private copies of the “current” page, as determined by the value of the stack pointer register.

When syscall exec [in a child] is executed, the kernel should cancel most of the mapping done during fork [bumping down refcounts], freeing the display of children, etc. and restore the original source page (i.e., it will no longer share its pages if it does not do another fork )




Despite the fact that you are not part of your original question, there are interesting actions that may be of interest, for example, when loading on request [pages] and on demand link [characters] after syscall exec .

When the process executes, the kernel performs the cleanup above and reads a small portion of the executable to determine its object format. The dominate format is ELF, but any format that the kernel understands can be used (for example, OSX can use ELF [IIRC], but it also has others].

For ELF, the executable file has a special section that gives the full FS path to the so-called "ELF interpreter", which is a common library and usually /lib64/ld.linux.so .

The kernel, using the internal form of mmap , will display this in the application space and configure the display of the executable itself. Most things are marked as R / O and no pages.

Before we go any further, we need to talk about “backup storage” for the page. That is, if a page error occurs, and we need to load the page from disk, where it comes from. For the / malloc heap, this is usually aka paging disk.

In linux, the partition that is of the type “linux swap” that was added during the installation of the system is usually used. When a page is written so that it is flushed to disk to free some physical memory, it will be written there. Please note that the page sharing algorithm in the first section is still applied.

In any case, when an executable file is first mapped to memory, its backup storage is an executable file in the file system.

Thus, the kernel sets the application program counter to indicate the initial location of the ELF interpreter, and transfers control to it.

The ELF interpreter is doing its job. Each time he tries to execute a part of himself [the code page] that is displayed but not loaded, a page error occurs and loads this page from the backup storage (for example, the ELF interpreter file) and changes the display to R / O, but is present.

This happens for the ELF interpreter, shared libraries, and the executable itself.

The ELF interpreter will now use mmap to map libc to the application space [again, provided demand is loaded]. If the ELF interpreter needs to change the code page to move the character [or tries to write to anyone that has a file as a backup storage, for example, the data page], a protection error occurs, the kernel changes the backup storage for the page from the file to disk to page on the swap disk, sets up protection and resumes the application.

The kernel should also handle the case when the ELF interpreter (for example) tries to write to the [say] data page that has not yet been loaded (that is, it must first load it and then change the backup storage to the swap disk)

The ELF interpreter then uses parts of libc to help it complete its initial binding operations. He moves the minimum necessary to allow him to do his job.

However, the ELF interpreter does not move anywhere near all characters for most other shared libraries. It will look at the executable and, again using mmap , will create a mapping for the shared libraries that the executable needs (i.e. what you see when you execute ldd executable ).

These mappings for shared libraries and executables can be thought of as “segments”.

There is a symbol conversion table that points to an interpreter in each shared library. But the ELF interpreter makes minimal changes.

[Note: this is a free explanation] Only when the application tries to call the specified transition function into transition mode [this is GOT et. and other things that you may have seen] relocation occurs. Entering the transition transfers control to the interpreter, which finds the real address of the symbol and configures the GOT so that it now points directly to the final address of the symbol and repeats the call, which now calls the real function. When you subsequently call the same given function, it now goes straight.

This is called on-demand binding.

A by-product of all this mmap activity is the classic syscall sbrk , which is practically worthless. He will soon encounter one of the common sections of the library's memory.

So modern libc does not use it. When malloc needs more memory from the OS, it requests more memory from the anonymous mmap and keeps track of which allocations belong to which mmap map. (that is, if enough memory was freed up to contain a full mapping, free could do a munmap ).

So, to summarize: we have "copy by write", "on demand download" and "on demand" everything happens simultaneously. It seems complicated, but it makes fork and exec quick, smooth. This adds some complexity, but additional overhead is only possible if necessary ("on demand").

Thus, instead of a large delay / delay at the beginning of the program launch, service activity is distributed over the program lifetime as necessary.

+4
May 05 '16 at 21:44
source share

To better understand, you should exclude the term "segment" from your vocabulary. Most systems work on pages; not segments. In the 64-bit segments, Intel finally left.

You should ask, "Which pages are affected by copying on recording."

These will be pages that can be written and shared by several processes when one process writes to it.

This can happen after the plug. One way to implement forking is to create a full copy of the address space of the parent process. However, this can be a lot of effort, especially because most of the time you execute exec on the child immediately after the fork.

An alternative is that the parent and the children have the same memory. This works great for read-only memory, but has obvious problems if multiple processes can write to the same memory.

This can be overcome due to the fact that the processes load the read / write memory until the process writes to it. In this case, this page becomes an undisclosed recording process, the OS selects a new frame for the page, compares it with the address space, copies the original data to this page, and then allows you to continue the recording process.

+1
May 05 '16 at 20:52
source share



All Articles