C: Why do unassigned pointers indicate unpredictable memory and DO NOT indicate NULL?

I went to school for a long time ago for C. I remember what I really hated about C: unassigned pointers do not point to NULL.

I asked many people, including teachers, why in the world they would be doing the default behavior of an unassigned pointer rather than NULL , as this seems more dangerous for the unpredictable.

The answer was supposedly performance, but I never bought it. I think that many errors in programming history could have been avoided if C were NULL by default.

Here's some C code indicating (pun intended) what I'm talking about:

#include <stdio.h> void main() { int * randomA; int * randomB; int * nullA = NULL; int * nullB = NULL; printf("randomA: %p, randomB: %p, nullA: %p, nullB: %p\n\n", randomA, randomB, nullA, nullB); } 

What compilations with warnings (it's nice to see C compilers much better than when I was at school) and it produces:

randomA: 0xb779eff4, randomB: 0x804844b, nullA: (nil), nullB: (nil)

+46
c pointers
Jun 23 '10 at 13:12
source share
11 answers

Actually, it depends on the storage of the pointer. Pointers with static storage are initialized with null pointers. Pointers with automatic storage duration are not initialized. See ISO C 99 6.7.8.10:

If an object with automatic storage duration is not explicitly initialized, its value is undefined. If an object that has a static storage duration is not initialized explicitly, then:

  • if it has a pointer type, it is initialized with a null pointer;
  • if it has an arithmetic type, it is initialized (positive or unsigned) to zero;
  • if it is an aggregate, each member is initialized (recursively) in accordance with these rules;
  • if it is a union, the first named element is initialized (recursively) according to these rules.

And yes, objects with automatic storage time are not initialized for performance reasons. Imagine initializing a 4K array with every call to the logging function (as I saw in the project I was working on, fortunately, C allowed me to avoid initialization, which led to a good performance improvement).

+41
Jun 23 '10 at 13:29
source share

Because in C, declaration and initialization are intentionally different steps . They are deliberately different from each other because that is how C. is designed.

If you say this inside a function:

 void demo(void) { int *param; ... } 

You say: "My dear C compiler, when you create a stack frame for this function, don't forget to reserve sizeof(int*) bytes for storing the pointer." The compiler does not ask what is happening there - it is assumed that you will tell it soon. If you do not, maybe the best language for you;)

It might not be devilishly difficult to create some secure stack cleanup code. But that would have to be called on every function call, and I doubt that many C developers will appreciate the hit when they are just about to fill it themselves anyway. By the way, you can do a lot for performance if you are allowed to be flexible with the stack. For example, the compiler can do optimizations where ...

If your function1 calls another function2 and stores its return value, or maybe there are some parameters passed to function2 that do not change inside function2 ... we don’t need to create extra space, right? Just use the same part of the stack for both! Please note that this directly contradicts the concept of stack initialization before each use.

But in a broader sense (and, which, in my opinion, more importantly), he aligned himself with the philosophy of C, so as not to do more than is absolutely necessary. And this applies to whether you work on PDP11, PIC32MX (for which I use it) or Cray XT3. This is exactly why people can use C instead of other languages.

  • If I want to write a program without tracing malloc and free , I do not need to! Memory management is not imposed on me.
  • If I want to beat a packet and collect data to combine data, I can! (Until, of course, I read my implementation notes on standard commitment.)
  • If I know exactly what I am doing with the stack frame, the compiler should not do anything for me!

In short, when you ask the C compiler to jump, it does not ask how tall. The resulting code will probably not even return.

Since most people who prefer to develop in C seem like this, he has enough inertia to not change. Your path may not be inherently a bad idea, but simply not asked by many other C developers.

+26
Jun 23 '10 at 13:50
source share

This is for performance.

C was first developed during PDP 11, for which 60k was the total maximum memory capacity, many of them will have much less. Unnecessary assignments would be especially expensive, this is such an environment.

Today, there are many built-in devices that use C, for which 60 thousand of memory will seem endless, the PIC 12F675 has 1k of memory.

+14
Jun 23 '10 at 13:15
source share

This is because when you declare a pointer, your C compiler simply reserves the necessary space to place it. Therefore, when you run your program, this space may already have a value in it, probably as a result of previous data allocated in this part of the memory.

The C compiler can assign a value to this pointer, but in most cases it will be a waste of time, since you do not need to assign your own value in any part of the code.

This is why good compilers warn when you are not initializing your variables; therefore, I don’t think there are so many errors due to this behavior. You just need to read the warnings.

+8
Jun 23 '10 at 13:28
source share

Pointers are not special in this regard; other types of variables have exactly the same problem if you use them uninitialized:

 int a; double b; printf("%d, %f\n", a, b); 

The reason is simple: the runtime is required to set uninitialized values ​​to a known value, adding overhead for each function call. The overhead may not differ much from a single value, but think about whether you have a large array of pointers:

 int *a[20000]; 
+7
Jun 23 '10 at 13:29
source share

When you declare a variable (pointer) at the beginning of the function, the compiler will do one of two things: put aside the register to use as this variable or allocate a place on the stack for it. For most processors allocating memory for all local variables on the stack is done with one instruction; it determines how much memory all local vars will need and pull out (or push up, on some processors) the stack pointer for as much. Whatever the memory does not change at that time, unless you explicitly change it.

The pointer is not "set" to a "random" value. Before distributing the stack, the stack below the stack pointer (SP) contains everything from earlier use:

  . . SP ---> 45 ff 04 f9 44 23 01 40 . . . 

After allocating memory for the local pointer, the only thing that has changed is the stack pointer:

  . . 45 ff | 04 | allocated memory for pointer. f9 | SP ---> 44 | 23 01 40 . . . 

This allows the compiler to distribute all local vars in one instruction, which moves the stack pointer down the stack (and free them all in one instruction, moving the stack pointer back), but forces you to initialize them yourself if you need to.

In C99, you can mix code and declarations to snooze in code until you can initialize it. This will allow you to not set the value to NULL.

+4
Jun 23 '10 at 15:20
source share

Firstly, force initialization does not fix errors. He disguises them. Using a variable that does not have a valid value (and what is application dependent) is an error.

Secondly, you can often do your own initialization. Instead of int *p; write int *p = NULL; or int *p = 0; . Use calloc() (which initializes memory to zero), not malloc() (which is not the case). (No, all bits of zero do not necessarily mean NULL pointers or floating point values ​​of 0. Yes, this applies to most modern implementations.)

Thirdly, the philosophy of C (and C ++) should enable you to quickly do something. Suppose you have a choice to implement in this language a safe way to do something and a quick way to do something. You cannot make the safe way faster by adding more code to it, but you can make the faster way safer by doing this. In addition, you can sometimes perform operations quickly and safely, ensuring that the operation is safe without additional checks - provided, of course, that you have a quick option to start.

C was originally designed to write an operating system and related code, and some parts of operating systems should be as fast as possible. This is possible in C, but even more so in safer languages. Moreover, C was developed when the largest computers were less powerful than the phone in my pocket (which I am updating soon because it feels old and slow). Saving multiple machine loops in commonly used code can have visible results.

+3
Jun 23 2018-10-06T00:
source share

So, to summarize what the ninjal explained, if you change your sample program a bit, the pointers will be initialized to NULL:

 #include <stdio.h> // Change the "storage" of the pointer-variables from "stack" to "bss" int * randomA; int * randomB; void main() { int * nullA = NULL; int * nullB = NULL; printf("randomA: %p, randomB: %p, nullA: %p, nullB: %p\n\n", randomA, randomB, nullA, nullB); } 

It prints on my machine

randomA: 00000000, randomB: 00000000, nullA: 00000000, nullB: 00000000

+1
Jun 25 '10 at
source share

I think this comes from the following: there is no reason why certain values ​​(0, NULL or something else) should be kept in memory (when turned on). Thus, if it was not previously written specifically, the memory cell can contain any value that, from your point of view, is random in any case (but this place could be used earlier by some other software and therefore contain a value that makes sense for this application, for example a counter, but from your point of view, is just a random number). To initialize it to a certain value, you need at least one instruction; but there is a situation where you do not need this initialization a priori, for example. v = malloc(x) assigns v a valid address or NULL, regardless of the initial contents of v. Thus, initializing it may be considered a waste of time, and a language (such as C) may not do this a priori. Of course, at the moment this is mostly insignificant, and there are languages ​​in which uninitialized variables have default values ​​(null for pointers, when supported, 0 / 0.0 for numerical ... etc., Lazy initialization, of course, does its not so expensive to initialize an array of 1 million elements, since they are initialized for real only when accessed to the destination).

0
Jun 23 '10 at 14:12
source share

The idea that this has anything to do with the arbitrary contents of memory when the machine is turned on is fictitious, with the exception of embedded systems. Any machine with virtual memory and a multiprocessor / multi-user operating system initializes memory (usually to 0) before passing it to the process. Failure to do so is a serious safety violation. "Random" values ​​in automatic storage variables refer to previous use of the stack by the same process. Similarly, "random" values ​​in memory are returned by malloc / new / etc. come from previous distributions (which were subsequently released) in the same process.

0
Jun 26 '10 at 5:33
source share

To indicate NULL, he would have to assign it NULL (even if it was done automatically and transparently).

So, in order to answer your question, the reason why a pointer cannot be both unassigned and NULL is because the pointer cannot be both assigned and not assigned.

-one
Jun 23 '10 at 16:12
source share



All Articles