The hardware provides a memory management unit . This is part of the scheme, which is able to intercept and modify any access to memory. Whenever the processor accesses RAM, for example, to read the next command to execute or as access to data initiated by the instruction, it does this at some address, which, roughly speaking, has a 32-bit value. A 32-bit word can have a bit more than 4 billion different values, so there is a 4 GB address space: the number of bytes that can have a unique address.
Thus, the processor sends a request to its memory subsystem, as it "selects a byte at address x and returns it to me." The request goes through the MMU, which decides what to do with the request. MMU actually parses 4 GB of space per page; The page size depends on the equipment you use, but typical sizes are 4 and 8 kB. The MMU uses tables that tell what to do with hits for each page: either access is granted with a rewritten address (the page entry says: βyes, the page with address x exists, it is in physical memory at address yβ) or rejected, and at this point, the kernel is called for future use. The kernel can decide to kill the breach process or do some work and modify the MMU tables so that this query can be tried again, this time successfully.
This is the basis for virtual memory: from the point of view, the process has some RAM, but the kernel moved it to the hard drive in "swap space". The corresponding table is marked as "missing" in the MMU tables. When a process accesses its data, the MMU calls the kernel, which retrieves the data from the swap, returns it to some free space in the physical RAM, and changes the MMU tables to that place in this space. Then the kernel returns to the process, right in the instruction that runs it all. The process code does not see anything in common with the whole business, except that accessing memory takes quite a lot of time.
The MMU also processes access rights that prevent a process from reading or writing data belonging to other processes or the kernel. Each process has its own set of MMU tables, and the kernel manages these tables. Thus, each process has its own address space, as if it were alone on a machine with 4 GB of RAM - except that the process had better access to memory, which it did not rightfully allocate from the kernel, since the corresponding pages are marked as absent or forbidden.
When a kernel is called through a system call from a process, the kernel code must be executed in the address space of the process; therefore, the kernel code must be somewhere in the address space of each process (but protected: MMU tables prevent access to kernel memory from unprivileged user code). Since the code may contain hard-coded addresses, the kernel must be at the same address for all processes; conditionally, in Linux this address is 0xC0000000. The MMU tables for each process card, that part of the address space for any physical RAM blocks the kernel, were actually loaded at boot time. Note that kernel memory is never replaced (if the code that can read data from the swap space itself has changed, everything will be too sour).
On a PC, things can be a little more complicated, because there are 32-bit and 64-bit modes, segment registers and PAE (which acts as a kind of second-level MMU with huge pages). The basic concept remains unchanged: each process gets its own idea of ββa virtual 4 GB address space, and the kernel uses MMU to map each virtual page to the corresponding physical position in RAM or anywhere else.