About pointers after fork ()

This is kind of a technical question, maybe you can help me if you know about C and UNIX (or maybe it's really a beginners question!)

The question was raised today by analyzing some code in our course "Operating Systems". We study what it means to fork a process on UNIX, we already know that it creates a copy of the current process parallel to it, and they have separate data sections.

But then I thought that maybe if you create a variable and a pointer pointing to it before doing fork (), since the pointer stores the memory address of the variable, you can try to change the value of this variable from the child process using this pointer.

We tried code similar to this in the class:

#include <stdio.h> #include <sys/types.h> #include <stdlib.h> int main (){ int value = 0; int * pointer = &value; int status; pid_t pid; printf("Parent: Initial value is %d\n",value); pid = fork(); switch(pid){ case -1: //Error (maybe?) printf("Fork error, WTF?\n"); exit(-1); case 0: //Child process printf("\tChild: I'll try to change the value\n\tChild: The pointer value is %p\n",pointer); (*pointer) = 1; printf("\tChild: I've set the value to %d\n",(*pointer)); exit(EXIT_SUCCESS); break; } while(pid != wait(&status)); //Wait for the child process printf("Parent: the pointer value is %p\nParent: The value is %d\n",pointer,value); return 0; } 

If you run it, you will get something like this:

Parent: initial value is 0

Baby: I will try to change the meaning

Child: pointer value is 0x7fff733b0c6c

Baby: I set the value to 1

Parent: pointer value is 0x7fff733b0c6c

Parent: value is 0

Obviously, the child process had no effect on the parent process. Honestly, I was expecting a "segmentation error" error due to access to an invalid memory address. But what really happened?

Remember, I'm not looking for a way to communicate with processes, this is not the main thing. I want to know what the code has done. Inside the child process, the change is visible, so it is DID.

My main hypothesis is that pointers are not absolute for memory, they relate to the process stack. But I could not find the answer (no one in the class knew and didn’t search on Google, I just found some questions about the processing of the process), so I would like to learn from you, I hope someone will find out.

Thanks for taking the time to read!

+6
source share
3 answers

The key point here is the concept of virtual address space.

Modern processors (say, something newer than 80386) have a memory management unit that maps from the virtual address space of each process to pages of physical memory under the control of the kernel.

When the kernel installs a process, it creates a set of page table entries for this process, which defines the pages of physical memory for mapping the virtual address space, and it is in this virtual address space that the program executes.

Conceptually, when you use a fork, the kernel copies the existing process pages to a new set of physical pages and sets up new process page tables, so as for the new process, it works in the same virtual memory as the original one, and in fact refers to a completely different physical memory.

The details are more subtle, since no one wants to waste time copying hundreds of MB of data if this is not required. When a process calls fork (), the kernel sets up the second set of page table entries (for the new process), but points to the same physical pages as the original process, then sets a flag in both sets of pages. Mmu considers them read-only. ....

As soon as the process writes to the page, the memory control unit generates a page error (due to the PTE record having the read-only flag set), and the page error handler then selects the new page from the physical memory, copies the data, overloads, updates the record in the page table and returns pages for reading / writing. Thus, the pages are actually only copied for the first time, when the process tries to make changes to the copy on the recording page, and a small part of the hand is completely ignored by any process.

Regards, Dan.

+11
source

Logically, the fork() ed process gets its own independent copy of more or less the entire state of the parent process. This might not work if the pointers in the child were referencing memory belonging to the parent.

Information on how a particular UNIX-like kernel does this job may vary. Linux implements the memory of the child process using copy-to-write pages, which makes fork() relatively cheap compared to other possible implementations. In this case, the pointers to the child really indicate the parental memory of the process, until the time when either the child or the parent tries to change this memory, and at that time a copy will be made for the child to use. It all depends on the underlying virtual memory system. Other UNIX and UNIX-like systems can and did this differently.

+4
source

The child modified the pointer, which is completely legal in its address space, because it is a copy of its parent. There was no effect on the parent because the memory was not logically divided. Each process gets its own separate path after the fork.

There are several ways to create shared memory on UNIX (where one process can change memory and have this modification visible by another process), but fork not one of them. And this is good, because otherwise synchronization between the parent and the child will be almost impossible.

+3
source

Source: https://habr.com/ru/post/977147/


All Articles