In the structure, is it legal to use one field of an array to access another?

As an example, consider the following structure:

struct S { int a[4]; int b[4]; } s; 

Would it be legal to write sa[6] and expect it to be equal to sb[2] ? Personally, I feel that it should be UB in C ++, while I'm not sure about C. However, I could not find anything relevant in the C and C ++ language standards.




Update

There are several answers that suggest ways to ensure that there is no padding between the fields to ensure reliable code execution. I would like to emphasize that if such code is UB, then insufficient filling is not enough. If it is UB, then the compiler can assume that the calls to Sa[i] and Sb[j] do not allow overlapping, and the compiler can freely change the order of access to memory. For example,

  int x = sb[2]; sa[6] = 2; return x; 

can convert to

  sa[6] = 2; int x = sb[2]; return x; 

which always returns 2 .

+50
c ++ c arrays struct
Nov 03 '17 at 10:59 on
source share
9 answers

Would it be legal to write sa [6] and expect it to be equal to sb [2]?

No. Because access to the array from the associated call to undefined behavior in C and C ++.

C11 J.2 Undefined Behavior

  • Adding or subtracting a pointer to an object of an array or an integer type or only outside it leads to a result that immediately indicates the object of the array and is used as the operand of the unary operator * , which (6.5.6).

  • The array index is out of range, even if the object is apparently available with the given index (as in the expression lvalue a[1][7] , given the declaration int a[4][5]) (6.5.6).

C ++ Standard Draft Section 5.7. The additive operators in paragraph 5 say:

When an expression with an integral type is added or subtracted from the pointer, the result has the type of the operand of the pointer. If the pointer operand points to an element of the array object, and the array is large enough, the result indicates the offset of the element from the original element such that the difference between the indices of the resulting and initial elements of the array is equal to the integral expression. [...] If both the pointer operands and the result point to elements of the same array object, or one after the last element of the array, the evaluation should not lead to overflow; otherwise, the behavior is undefined.

+60
Nov 03 '17 at 11:01
source share

In addition to @rsp answer ( Undefined behavior for an array subscript that is out of range ), I can add that accessing b through a not legal, because the C language does not indicate how many spaces there can be between the end of the area allocated for a and start b, so even if you can run it for a specific implementation, it is not portable.

 instance of struct: +-----------+----------------+-----------+---------------+ | array a | maybe padding | array b | maybe padding | +-----------+----------------+-----------+---------------+ 

The second complement may skip, and the alignment of the struct object is the alignment of a , which matches the alignment of b , but C also does not overlap the second complement, which should not be there.

+32
Nov 03 '17 at 11:10
source share

a and b are two different arrays, and a is defined as containing 4 elements. Therefore, a[6] refers to the array outside the boundaries and, therefore, to undefined. Note that the index of array a[6] is defined as *(a+6) , so the proof of UB is actually set by the section “Additive operators” in combination with pointers. ”See the next section of the C11 standard (for example, this version of the online version) describing this aspect:

6.5.6 Additive operators

When an expression that has an integer type is added or subtracted from the pointer, the result is of the operand type of the pointer. If the pointer operand points to an element of the array object, and the array is large enough, the result indicates the offset of the element from the original element such that the difference between the indices of the resulting and initial elements of the array is equal to an integer expression. In other words, if the expression P points to the ith element of the array, the expressions (P) + N (equivalently, N + (P)) and (P) -N (where N has the value n) indicate respectively the ith + nth and in-th elements of the array, if they exist. Moreover, if the expression P points to the last element of the array object, the expression (P) +1 indicates one after the last element of the array object, and if the expression Q indicates one after the last element of the array object, the expression (Q) -1 indicates the last element of the array an object. If both the operands of the pointer and the result point to elements of the same array object, or one after the last element of the array, the evaluation should not lead to overflow; otherwise, the behavior is undefined . If the result indicates one after the last element of an array object, it should not be used as the operand of the unary * operator that is evaluated.

The same argument applies to C ++ (although not cited here).

In addition, although it is obvious that the undefined behavior is related to the fact that the boundaries of the array a exceeded, note that the compiler can introduce an addition between the members a and b , so - even if such an arithmetic pointer was allowed - a+6 does not necessarily produce the same address as b+2 .

+10
Nov 03 '17 at 11:08
source share

It is legal? No. As mentioned above, it invokes Undefined Behavior.

Will this work? It depends on your compiler. This is the thing about Undefined behavior: it is undefined.

On many C and C ++ compilers, the structure will be laid out in such a way that b immediately monitors the memory and there is no border check. Therefore, access to [6] will be practically the same as b [2], and will not cause any exceptions.

Considering

 struct S { int a[4]; int b[4]; } s 

and, without allowing additional padding, structure is just a way to look at a block of memory containing 8 integers. You can apply it to (int*) , and ((int*)s)[6] to the same memory as sb[2] .

Should you rely on this behavior? Absolutely not. Undefined means that the compiler should not support this. The compiler can freely place a structure that could make the assumption that & (sb [2]) == & (sa [6]) is incorrect. The compiler could also add an array bounds check (although enabling compiler optimizations would probably disable that check).

I have survived the consequences of this in the past. It is quite common to have such a structure

 struct Bob { char name[16]; char whatever[64]; } bob; strcpy(bob.name, "some name longer than 16 characters"); 

Now bob.whatever will be "less than 16 characters". (so you should always use strncpy, BTW)

+6
Nov 03 '17 at 13:20
source share

As mentioned in a comment by @MartinJames, if you need to ensure that a and b are in continuous memory (or at least can be treated as such, (edit) if your architecture / compiler does not use an unusual memory block size / offset and forced alignment, which requires the addition of an add-on), you need to use union .

 union overlap { char all[8]; /* all the bytes in sequence */ struct { /* (anonymous struct so its members can be accessed directly) */ char a[4]; /* padding may be added after this if the alignment is not a sub-factor of 4 */ char b[4]; }; }; 

You cannot directly access b from a (for example, a[6] as you requested), but you can access elements of both a and b with all (for example, all[6] refers to the same cell memory as b[2] ).

(Edit: you could replace 8 and 4 in the above code with 2*sizeof(int) and sizeof(int) respectively, so that you are more likely to match the alignment of the architecture, especially if the code should be more portable, but then you have to be careful so as not to make any assumptions about how many bytes are in a , b or all . However, this will work on the most probable (1-, 2-, and 4-bytes).)

Here is a simple example:

 #include <stdio.h> union overlap { char all[2*sizeof(int)]; /* all the bytes in sequence */ struct { /* anonymous struct so its members can be accessed directly */ char a[sizeof(int)]; /* low word */ char b[sizeof(int)]; /* high word */ }; }; int main() { union overlap testing; testing.a[0] = 'a'; testing.a[1] = 'b'; testing.a[2] = 'c'; testing.a[3] = '\0'; /* null terminator */ testing.b[0] = 'e'; testing.b[1] = 'f'; testing.b[2] = 'g'; testing.b[3] = '\0'; /* null terminator */ printf("a=%s\n",testing.a); /* output: a=abc */ printf("b=%s\n",testing.b); /* output: b=efg */ printf("all=%s\n",testing.all); /* output: all=abc */ testing.a[3] = 'd'; /* makes printf keep reading past the end of a */ printf("a=%s\n",testing.a); /* output: a=abcdefg */ printf("b=%s\n",testing.b); /* output: b=efg */ printf("all=%s\n",testing.all); /* output: all=abcdefg */ return 0; } 
+5
Nov 04 '17 at 0:18
source share

No , because accessing the array outside causes Undefined Behavior, both in C and C ++.

+3
Nov 03 '17 at 11:02
source share

Short answer: None. You are in a country of conduct undefined.

Long answer: None. But this does not mean that you cannot access the data in other ways of the scatters ... if you use GCC, you can do something like the following (developing the dwillis answer):

 struct __attribute__((packed,aligned(4))) Bad_Access { int arr1[3]; int arr2[3]; }; 

and then you can access through ( Godbolt source + asm ):

 int x = ((int*)ba_pointer)[4]; 

But this violation violates a strict alias, therefore it is safe only with g++ -fno-strict-aliasing . You can hover over the pointer to the first member, but then you will return to the UB boat because you are going beyond the first member.

Alternatively, just do not do this. Save the future programmer (perhaps himself), the suffering of this mess.

Also, while we're on it, why not use std :: vector? It's not perfect, but he has defenders in the background to prevent such bad behavior.

Addendum:

If you are really concerned about performance:

Say you have two of the same type of pointers that you are accessing. The compiler will most likely assume that both pointers have the ability to intervene and will create additional logic to protect you from doing something dumb.

If you solemnly swear to the compiler that you are not trying to execute an alias, the compiler will reward you: Does the keyword provide significant advantages in gcc / g ++

Conclusion: do not be evil; your future and the compiler will be grateful to you.

+1
Nov 03 '17 at 16:27
source share

Jed Schuffs answer is on the right track, but not quite right. If the compiler inserts the indentation between a and b , its solution will still fail. If, however, you declare:

 typedef struct { int a[4]; int b[4]; } s_t; typedef union { char bytes[sizeof(s_t)]; s_t s; } u_t; 

Now you can access (int*)(bytes + offsetof(s_t, b)) to get the sb address, regardless of how the compiler laid out the structure. The offsetof() macro is declared in <stddef.h> .

The expression sizeof(s_t) is a constant expression that is legal in declaring an array in both C and C ++. It will not give an array of variable length. (I apologize for the misuse of standard C. I thought this sounds wrong.)

In the real world, however, two consecutive int arrays in the structure will be laid out as you expect. (Perhaps you can develop a very far-fetched counterexample by setting the border of a to 3 or 5 instead of 4, and then getting a compiler to align both a and b to a border of 16 bytes.) Rather than confusing methods to try to get a program that does not make any assumptions outside the strict wording of the standard, you need some kind of security coding, for example static assert(&both_arrays[4] == &s.b[0], ""); . They do not add unnecessary overhead at run time and will not work if your compiler does something that breaks your program unless you call UB in the statement itself.

If you need a portable way to ensure that both sub-arrays are packed in an adjacent memory range, or split the memory block in another way, you can copy them using memcpy() .

+1
Nov 04 '17 at 8:21 on
source share

The standard does not impose any restrictions on what implementations should do when a program tries to use an array index out of bounds in one structure field to access a member of another. Thus, access outside borders is “illegal” in strictly appropriate programs, and programs that use such calls cannot be 100% portable and error free at the same time. On the other hand, many implementations determine the behavior of such code, and programs that focus exclusively on such implementations can use this behavior.

There are three problems with this code:

  • While many implementations expose structures in a predictable way, the standard allows implementations to add arbitrary padding before any member of the structure other than the first. The code can use sizeof or offsetof to ensure that members of the structure are positioned as expected, but the other two problems remain.

  • Given something like:

     if (structPtr->array1[x]) structPtr->array2[y]++; return structPtr->array1[x]; 

    it would usually be useful for the compiler to suggest that using structPtr->array1[x] will result in the same value as the previous use in the if condition, even if it changes the behavior of the code, which depends on smoothing between the two arrays.

  • If array1[] has, for example, 4 elements, the compiler gave something like:

     if (x < 4) foo(x); structPtr->array1[x]=1; 

may conclude that since there would be no specific cases where x not less than 4, he could unconditionally call foo(x) .

Unfortunately, while programs can use sizeof or offsetof to ensure that there are no surprises in the structure of the structure, there is no way by which they can check whether compilers agree to refrain from optimizing types # 2 or # 3. In addition, the Standard is a bit vague about what will mean in case, for example:

 struct foo {char array1[4],array2[4]; }; int test(struct foo *p, int i, int x, int y, int z) { if (p->array2[x]) { ((char*)p)[x]++; ((char*)(p->array1))[y]++; p->array1[z]++; } return p->array2[x]; } 

It’s pretty clear in the standard that the behavior will be determined only if z is in the range 0..3, but since the type p-> of the array in this expression is char * (due to decay), this is not clear the access cast using y will have any effect. On the other hand, since converting the pointer to the first element of the structure to char* should give the same result as converting the pointer to the structure to char* , and the converted pointer to the structure must be accessible to access all bytes, it seems that access using x should be defined for (at least) x = 0..7 [if the offset of array2 greater than 4, this will affect the x value needed to get into the members of array2 , but some value of x can do this with a certain behavior].

IMHO, a good tool would be to define an index operator on array types so that it does not include pointer decomposition. In this case, the expressions p->array[x] and &(p->array1[x]) may suggest to the compiler that x is 0..3, but p->array+x and *(p->array+x) will require compiler capabilities for other values. I don't know if any compilers do, but the standard does not require this.

0
Nov 04 '17 at 16:23
source share



All Articles