Example error caused by UB incrementing a NULL pointer

This code:

int *p = nullptr; p++; 

causes undefined behavior, as discussed in Is a null pointer pointer valid?

But, explaining to comrades why they should avoid UB, besides that it is bad, because UB means that something can happen, I like to have an example demonstrating this. I have tons to access the array beyond, but I could not find any for this.

I even tried

 int testptr(int *p) { intptr_t ip; int *p2 = p + 1; ip = (intptr_t) p2; if (p == nullptr) { ip *= 2; } else { ip *= -2; } return (int) ip; } 

in a separate compilation unit, hoping that the optimizing compiler will skip the test, because when p is null, the string int *p2 = p + 1; is UB, and compilers are allowed to assume that the code does not contain UB.

But gcc 4.8.2 (I don't have a useful gcc 4.9) and clang 3.4.1 both respond to a positive value!

Can anyone suggest some smarter code or another optimizing compiler to show the problem when adding a null pointer?

+6
c ++ undefined-behavior
Apr 24 '15 at 10:30
source share
4 answers

How about this example:

 int main(int argc, char* argv[]) { int a[] = { 111, 222 }; int *p = (argc > 1) ? &a[0] : nullptr; p++; p--; return (p == nullptr); } 

At face value, this code says: โ€œIf there are any command line arguments, initialize p to point to the first element of a[] , otherwise initialize it to zero. Then increase it, then decrease it and tell me if it is null. '

At first glance, this should return '0' ( p not null) if we give a command line argument and '1' (indicating zero) if we donโ€™t. Note that we never look for p , and if we give an argument, then p always indicates the limits of a[] .

Compiling with the command line clang -S --std=c++11 -O2 nulltest.cpp (Cygwin clang 3.5.1) gives the following generated code:

  .text .def main; .scl 2; .type 32; .endef .globl main .align 16, 0x90 main: # @main .Ltmp0: .seh_proc main # BB#0: pushq %rbp .Ltmp1: .seh_pushreg 5 movq %rsp, %rbp .Ltmp2: .seh_setframe 5, 0 .Ltmp3: .seh_endprologue callq __main xorl %eax, %eax popq %rbp retq .Leh_func_end0: .Ltmp4: .seh_endproc 

This code says "return 0". It doesn't even bother checking the number of command line arguments.

(And interestingly, commenting on decrement has no effect on the generated code.)

+6
May 20 '15 at 13:54
source share

Extracted from http://c-faq.com/null/machexamp.html :

Q: Seriously, if real machines really used non-zero null pointers or different views for pointers to different types?

A: Segment 07777 of the Prime 50 series, offset 0 for null pointer, at least for PL / I. Later models used segment 0, offset 0 for null pointers to C, which requires new instructions such as TCNP (Test C Null Pointer), obviously like sop for [footnote] all existing poorly written C code, which made incorrect assumptions . Elder, printing machines were also known that required larger byte pointers ( char * ) than pointers to words ( int * 's).

The Eclipse MV series from Data General has three architecturally supported pointer formats (words, bytes, and bit pointers), two of which are used by C compilers: byte pointers for char * and void * , and word pointers for everything else. For historical reasons, during the evolution of the 32-bit MV line from the 16-bit Nova line, the word pointers and byte pointers had an offset, indirection and ring of protection bits in different places of the word. Passing mismatches pointer format to function led to security errors. In the end, the MV C compiler added many compatibility options to try to handle code with pointer type mismatch errors.

Some Honeywell-Bull mainframes use the 06000 bit chart for (internal) null pointers.

The CDC Cyber โ€‹โ€‹180 series has 48-bit pointers consisting of a ring, segment, and offset. Most users (in ring 11) have null pointers 0xB00000000000. It was common for older CDC devices to use a single-bit word as a special flag for all kinds of data, including invalid addresses.

The old HP 3000 series uses a different addressing scheme for the address byte than for word addresses; as several machines are higher so it uses different representations for char * and void * than for other pointers.

Lisp Symbols A machine-tagged architecture does not even have ordinary numeric pointers; it uses a pair of <NIL, 0> (basically a nonexistent <object, offset> descriptor) as a null pointer C.

Depending on the memory model used, 8086-family processors (PC compatible devices) can use 16-bit data pointers and a 32-bit pointer function, or vice versa.

Some 64-bit Cray machines represent int * in the lower 48 bit word; char * additionally uses some of the upper 16 bits to indicate the byte address inside the word.

Given that these null pointers have a strange representation of the bit pattern in the quoted machines, the code you put:

 int *p = nullptr; p++; 

will not give the value expected by most people ( 0 + sizeof(*p) ).

Instead, you will have a value based on your particular machine-specific nullptr pattern (unless the compiler has a special case for null pointer arithmetic, but since this is not standard, you most likely encountered Undefined Behavior with "visible" specific effect).

+6
May 20 '15 at 11:49
source share

An ideal implementation of C, if it is not used for types of system programming, which will require the use of pointers that, according to the programmer, made sense, but the compiler did not, make sure that each pointer was either valid or invalidated, and would block any the temporary code either tried to dereference an invalid pointer (including null), or use illegal means to create something that was not a valid pointer, but could be mistaken for it. On most platforms, the generated code provides such a restriction in all situations that it will be quite expensive, but protection against many common error scenarios is much cheaper.

On many platforms, it is relatively inexpensive for the compiler to generate *foo=23 code, equivalent to if (!foo) NULL_POINTER_TRAP(); else *foo=23; if (!foo) NULL_POINTER_TRAP(); else *foo=23; . Even primitive compilers in the 1980s often had the opportunity to do so. However, the usefulness of such a catch can be largely lost, however, if compilers allow you to add a null pointer in such a way that it is no longer recognized as a null pointer. Therefore, a good compiler should, when error capture is enabled, replace foo++; on foo = (foo ? foo+1 : (NULL_POINTER_TRAP(),0)); . Probably the real โ€œbillion dollar errorโ€ did not invent null pointers, but rather that some compilers catch direct stores of null pointers, but will not catch null pointer arithmetic.

Given that the ideal compiler will catch an attempt to increase the null pointer (many compilers cannot do this for performance reasons, not semantics), I see no reason why the code should expect such an increase to make sense. In almost any case, when a programmer can expect the compiler to assign a value to such a construct [for example, ((char*)0)+5 , giving a pointer to address 5], it would be better if the programmer instead used some kind of another construction to form the desired pointer (for example, ((char*)5) ).

+2
May 20 '15 at 18:32
source share

This is just to complete, but the link suggested by @HansPassant in the comment really deserves to be quoted as an answer.

All links here , here are just some excerpts

This article describes a new memory-friendly interpretation of a C-abstract machine, providing better security and debugging protection ... [Writers] demonstrate that it is possible for C memory to support not only C abstract machines as indicated, but a broader interpretation , which is still compatible with existing code. Forcing the model to hardware, our implementation provides memory security that can be used to provide high-level security properties for C ...

Memory features

[Implementation] are presented as a triplet (basic, related, permissions), which is freely packed into a 256-bit value. Here, the database provides an offset to the virtual address area and associated restrictions limit the size of the access area ... Special features of the loading and storage instructions allow you to distinguish between options on the stack or stored in data structures, just like pointers ... with the caveat that subtracting a pointer not allowed.

Adding permissions allows you to be tokens that provide certain rights to the mentioned memory. For example, a memory capability may have permissions to read data and capabilities, but not write them (or just to write data, but not for capabilities). Attempting any of the operations that are not allowed will cause a trap .

[Results] confirm that it is possible to maintain strong semantics of the memory model of the capability system ( which provides non-bypass memory protection ) without sacrificing the benefits of a low-level language.

(underline mine)

This means that even if it is not an operational compiler, there are studies to create one that could trap the use of incorrect pointers and is already published.

+1
May 21 '15 at 20:36
source share



All Articles