Why can we write out of bounds in C?

I recently finished reading about virtual memory, and I had a question about how malloc works in virtual address space and physical memory.

For example (code copied from another SO message)

void main(){ int *p; p=malloc(sizeof(int)); p[500]=999999; printf("p[0]=%d\n",p[500]); //works just fine. } 

Why is this allowed? Or, for example, why is this address in p [500] even writable?

Here is my hunch.

When malloc is called, perhaps the OS decides to give the process the whole page. I just assume that each page costs 4 KB of space. Is this the whole thing designated as writable? This is why you can go to a page of size 500 * sizeof (int) (assuming a 32-bit system, where int is 4 bytes).

I see that when I try to edit with a lot of meaning ...

  p[500000]=999999; // EXC_BAD_ACCESS according to XCode 

Seg error.

If so, does this mean that there are pages that are dedicated to your code / instruction / text segments and are marked as unregistered, completely separate from your pages, on which your stack / variables are located (where things change) and are marked as writable? Of course, this process assumes that they are next to each order in the 4gb address space on a 32-bit system.

+6
source share
6 answers

Consider the following code for Linux:

 #include <stdio.h> #include <stdlib.h> #include <unistd.h> int staticvar; const int constvar = 0; int main(void) { int stackvar; char buf[200]; int *p; p = malloc(sizeof(int)); sprintf(buf, "cat /proc/%d/maps", getpid()); system(buf); printf("&staticvar=%p\n", &staticvar); printf("&constvar=%p\n", &constvar); printf("&stackvar=%p\n", &stackvar); printf("p=%p\n", p); printf("undefined behaviour: &p[500]=%p\n", &p[500]); printf("undefined behaviour: &p[50000000]=%p\n", &p[50000000]); p[500] = 999999; //undefined behaviour printf("undefined behaviour: p[500]=%d\n", p[500]); return 0; } 

It prints a process memory card and addresses some types of memory.

 [ osboxes@osboxes ~]$ gcc tmp.c -g -static -Wall -Wextra -m32 [ osboxes@osboxes ~]$ ./a.out 08048000-080ef000 r-xp 00000000 fd:00 919429 /home/osboxes/a.out 080ef000-080f2000 rw-p 000a6000 fd:00 919429 /home/osboxes/a.out 080f2000-080f3000 rw-p 00000000 00:00 0 0824d000-0826f000 rw-p 00000000 00:00 0 [heap] f779c000-f779e000 r--p 00000000 00:00 0 [vvar] f779e000-f779f000 r-xp 00000000 00:00 0 [vdso] ffe4a000-ffe6b000 rw-p 00000000 00:00 0 [stack] &staticvar=0x80f23a0 &constvar=0x80c2fcc &stackvar=0xffe69b88 p=0x824e2a0 undefined behaviour: &p[500]=0x824ea70 undefined behaviour: &p[50000000]=0x1410a4a0 undefined behaviour: p[500]=999999 

Or why is this address in p [500] even writable?

The heap from 0824d000-0826f000 and & p [500] is randomly 0x824ea70, so the memory can be written and read, but this memory area can contain real data that will be changed! In the case of the exemplary program, most likely, it is not used, so writing to this memory is not harmful to the process.

& p [50000000] randomly equals 0x1410a4a0, which is not on the page that the kernel maps to the process, and therefore is unacceptable and unreadable, therefore, seg fails.

If you compile it with -fsanitize=address , memory accesses will be checked, and many, but not all illegal memory accesses will be reported by AddressSanitizer , Slowing is about two times slower than without AddressSanitizer.

 [ osboxes@osboxes ~]$ gcc tmp.c -g -Wall -Wextra -m32 -fsanitize=address [ osboxes@osboxes ~]$ ./a.out [...] undefined behaviour: &p[500]=0xf5c00fc0 undefined behaviour: &p[50000000]=0x1abc9f0 ================================================================= ==2845==ERROR: AddressSanitizer: heap-buffer-overflow on address 0xf5c00fc0 at pc 0x8048972 bp 0xfff44568 sp 0xfff44558 WRITE of size 4 at 0xf5c00fc0 thread T0 #0 0x8048971 in main /home/osboxes/tmp.c:24 #1 0xf70a4e7d in __libc_start_main (/lib/libc.so.6+0x17e7d) #2 0x80486f0 (/home/osboxes/a.out+0x80486f0) AddressSanitizer can not describe address in more detail (wild memory access suspected). SUMMARY: AddressSanitizer: heap-buffer-overflow /home/osboxes/tmp.c:24 main [...] ==2845==ABORTING 

If so, does this mean that there are pages that are dedicated to your code / instruction / text segments and are marked as unregistered, completely separate from your pages, on which your stack / variables are located (where things change) and are marked as writable?

Yes, see the output of the process memory card above. r-xp means readable and executable, rw-p means readable and writable.

+5
source

"Why is it allowed?" (write beyond borders)

C does not require additional CPU instructions, which are usually necessary to prevent this access outside the range.

This is speed C - he trusts the programmer, providing the encoder with all the rope necessary to complete the task - including enough rope to hang himself.

+11
source

Why is this allowed?

One of the main goals of designing C (and C ++) languages ​​should be as efficient as possible at runtime. The designers of C (or C ++) could decide to include the rule in the language specification, which states that "writing outside the bounds of the array should result in X" (where X is a specific behavior, such as a crash or an abandoned exception) ... but if they did, each C compiler would be required to generate border checking code for each access to the program array C. Depending on the target hardware and the compiler’s skill, applying this rule can easily make every C program (or C ++) 5-10 times slower it than now.

Therefore, instead of requiring the compiler to apply array constraints, they simply indicated that writing outside the array is undefined behavior - that is, you should not do this, but if you do, then there is no guarantee what will happen , and something you don't like is your problem, not theirs.

Real implementations of implementations are then free to do whatever they want - for example, in an OS with memory protection, you are most likely to see page-based behavior, as you described, or in an embedded device (or in an older OS such as MacOS 9, MS-DOS or AmigaDOS), a computer can simply happily let you write anywhere in memory, because otherwise it will make the computer too slow.

As a low-level language (by modern standards), C (C ++) expects the programmer to follow the rules and only mechanically apply these rules if / when he can do this without incurring service data.

+4
source

Undefined.

What it is. You may try to write outside, but this does not guarantee work. It might work, maybe not. What happens is completely undefined.

Why is this allowed?

Because C and C ++ standards allow this. Languages ​​are designed to be fast. To test access beyond access limits, a run-time operation is required that will slow down the program.

why is this address in p [500] even writable?

This happened. Undefined.

I see that when I try to edit with a lot of meaning ...

Cm? Again, this just happened to segfault.

When malloc is called, perhaps the OS decides to give the process a whole page.

Perhaps, but the C and C ++ standards do not require this behavior. They only require that the OS make at least the requested amount of memory available for use by the program. (If available memory.)

+2
source

This behavior is undefined ...

  • if you try to access external borders, everything can happen, including SIGEGV or corruption elsewhere on the stack, which is why your program produces incorrect results, freezes, crashes later, etc.

  • the memory can be writable without obvious failure with some task for some compilers / flags / OS / day of the week, etc., because:

    • malloc() can actually distribute a larger, allocated block in which [500] can be written (but with another program start, maybe not) or
    • [500] may be after the allocated block, but memory is still available for the program
      • it is likely that [500] - being a relatively small increment - it will still be in the heap, which may extend further than the addresses that the malloc calls have so far given due to some earlier heap memory reservation (for example, using sbrk() ) in preparation for the expected use.
      • It is vaguely possible that [500] is "from the end" of the heap, and you end up writing to another area of ​​memory, where, for example, on top of static data, data specific to the stream (including the stack)

Why is this allowed?

There are two aspects:

  • checking indexes at each access will inflate (add additional machine code instructions) and slow down the program, and as a rule, the programmer can perform some minimal index checking (for example, check once when entering a function, and then use the index, however, many times ) or generate indexes in such a way as to guarantee their reliability (for example, a cycle from 0 to the size of the array)

  • memory management is extremely accurate, so access to out-of-band channels is reported by some processor errors, is highly dependent on hardware and is generally possible only at page borders (for example, granularity in the range from 1 to 4 km), as well as receiving an additional command (whether in within some extended malloc function or in some malloc -wrapping code) and time for orchestration.

+1
source

It's just that in C the concept of an array is pretty basic.

The assignment p [] is in C the same as:

 *(p+500)=999999; 

and the whole compiler implements this:

 fetch p; calculate offset : multiply '500' by the sizeof(*p) -- eg 4 for int; add p and the offset to get the memory address write to that address. 

In many architectures, this is implemented in one or two instructions.

Please note that the compiler not only does not know that the value 500 is not in the array, but also does not know the size of the array to start with!

Some work has been done in C99 and later to make arrays more secure, but basically C is a language designed for quick compilation and quick launch, and not for safe use.

Put another way. In Pascal, the compiler will not allow you to shoot with your foot. In C ++, the compiler provides ways to make it more difficult to take your foot, while in C the compiler does not even know that you have a foot.

+1
source

Source: https://habr.com/ru/post/983994/


All Articles