What is the rationale for one of the last elements of the array?

Question

What is the rationale for one of the last elements of the array?

According to N1570 (project C11) 6.5.6/8 Additive operators:

In addition, if the expression P points to the last element of the array object, the expression (P)+1 indicates one after the last element of the array object , and if the expression Q points one after the last element of the array object, the expression (Q)-1 points to the last element of the object array

6.5.6/9 also contains:

In addition, if the expression P points to either an array element, an object, or one after the last element of an array object, and the expression Q points to the last element of the same array object, the expression ((Q)+1)-(P) has the same value that both ((Q)-(P))+1 and both -((P)-((Q)+1)) and has the value 0 if the expression P indicates one after the last element of the array object, although the expression (Q)+1 does not indicate an element of an array object. ¹⁰⁶⁾

This justifies pointer arithmetic like this:

 #include <stdio.h> int main(void) { int a[3] = {0, 1, 2}; int *P, *Q; P = a + 3; // one past the last element Q = a + 2; // last element printf("%td\n", ((Q)+1)-(P)); printf("%td\n", ((Q)-(P))+1); printf("%td\n", -((P)-((Q)+1))); return 0; }

I would expect to forbid pointing to an element of the array outside the limits for which dereferencing acts like undefined behavior (array overflow), which makes it potentially dangerous. Is there any justification for this?

+5

c c11

Grzegorz szpetkowski Dec 14 '14 at 18:26

source share

2 answers

The rationale is pretty simple. The compiler is not allowed to place the array at the end of the memory. To illustrate, suppose we have a 16-bit machine with 16-bit pointers. The lowest address is 0x0000. The high address is 0xffff. If you declare char array[256] and the compiler finds array at address 0xff00 , then technically the array will fit into memory using addresses 0xff00 thru 0xffff inclusive. However the expression

 char *endptr = &array[256]; // endptr points one past the end of the array

will be equivalent

 char *endptr = NULL; // &array[256] = 0xff00 + 0x0100 = 0x0000

This means that the next loop will not work, since ptr will never be less than 0

 for ( char *ptr = array; ptr < endptr; ptr++ )

So, the sections that you specified simply tell the lawyer: "Do not put arrays at the end of the memory area."

Historical Note: The earliest x86 processors used a segmented memory scheme in which memory addresses are specified in a 16-bit pointer register and a 16-bit segment register. The last address was calculated by shifting the register of the segment remaining by 4 bits and adding to the pointer, for example.

 pointer register 1234 segment register AB00 ----- address in memory AC234

The resulting address space was 1 MB, but the boundaries at the end of the memory were every 64 KB. This is one reason to use conversations with a lawyer instead of saying, “Don't put arrays at the end of your memory,” in plain English.

+3

user3386109 Dec 14 '14 at 20:20

source share

Pradhan · Accepted Answer · 2014-12-14T18:49:23+0000

Setting a range for a cyclic transition in the form of a half-closed interval [start, end) , especially for array indices, has some nice properties, as Dijkstra observed in one of his notes .

1) You can calculate the size of the range as a simple end - start function. In particular, if the range is specified in terms of array indices, the number of iterations performed by the loop will be given using end - start . If the range was [start, end] , then the number of iterations would be end - start + 1 - very annoying, right? :)

2) The second observation of Dijsktra refers only to the case of (non-negative) integral indices - with the range indicated as [start, end) and (start, end] both have the property indicated in 1). However, specifying it as (start, end] , you must allow index -1 represent a range of cycles, including index 0 - you allow an "unnatural" value of -1 just for the purpose of representing the range. The agreement [start, end) does not have this problem, because that end is a non-negative integer and, therefore, a natural choice when working with array indices.

Dijsktra evasion of -1 permits a similarity to skipping the last valid container address. However, since the aforementioned convention has been used for so long, it probably convinced the standards committee to make this exception.

What is the rationale for one of the last elements of the array?

More articles: