Can we change the base address of an array using a pointer to an array using brute force?

Someone wrote the following C program and asked why gcc allows you to "change the base address of the array." He knew the code was terrible, but still wanted to know. I found the question interesting enough because the relationship between arrays and pointers in C is subtle (here, using the address operator in an array! "Why does anyone do this?"), Confusing and therefore often misunderstood. The question has been deleted, but I thought I would ask him again, with the proper context and, I hope, the correct answer to go with him. Here is the original program.

static char* abc = "kj"; void fn(char**s) { *s = abc; } int main() { char str[256]; fn(&str); } 

It compiles with gcc (with warnings), links, and launches. What's going on here? Can we change the base address of an array by taking its address, hovering it over a pointer to a pointer (after all arrays are almost pointers to C, not them) and assign it to it?

+6
source share
3 answers

It cannot work (even theoretically) because arrays are not pointers:


  • int arr[10] :

    • The amount of used memory: sizeof(int)*10 bytes

    • The arr and &arr values ​​are necessarily identical

    • arr points to a valid memory address, but cannot be set to point to a different memory address


  • int* ptr = malloc(sizeof(int)*10) :

    • The amount of memory used: sizeof(int*) + sizeof(int)*10 bytes

    • The ptr and &ptr values ​​are not necessarily identical (in fact they are mostly different)

    • ptr can be configured to indicate both valid and invalid memory addresses, as many times as you will

+5
source

The program does not change the "base address" of the array. He is not even trying.

What you pass to fn is the 256-bit memory address. It is numerically identical to a pointer that str will decay into other expressions, only printed differently. Here, the array really remains an array - applying the address operator to the array is one of the instances where the array does not break into a pointer. For example, the increment &str would increase it numerically by 256. This is important for multidimensional arrays, which, as we know, are actually one-dimensional arrays of arrays in C. When incrementing the first "two-dimensional" index, the array must translate the address to the beginning of the next "fragment" or " line ".

Now catch it. As for fn, the address you pass in points to a location containing a different address. It is not true; it indicates a sequence of characters. Printing this sequence of bytes, interpreted as a pointer, displays the byte values ​​"A", 65 or 0x41.

fn, however, believing that the specified memory contains an address, overwrites it with the address in which "kj" is in memory. Since str has enough memory to store the address, the assignment succeeds and results in a useful address at that location.

It should be noted that this, of course, is not guaranteed. The most common cause of failure should be alignment problems. str , it seems to me, is not required to align correctly for the pointer value. The standard states that function arguments must be compatible with parameter declarations. Arbitrary pointer types cannot be assigned to each other (you need to go through void pointers for this or do it).

Edit: david.pfx indicated that (even if pressed correctly) the code causes undefined behavior . The standard requires access to objects through compatible lvalues ​​(including references) in section 6.5 / 7 of the last public project. When casting and compiling properly with gcc -fstrict-aliasing -Wstrict-aliasing=2 ... gcc warns of a "punning type". The rationale is that the compiler should be free to assume that incompatible pointers do not change the same memory area; there is no need to assume that fn modifies the contents of str. This allows the compiler to optimize the reboot (for example, from memory for registration), which otherwise would be necessary. This will play a role in optimization; a likely example where a debugging session could not reproduce the error (namely, if the debugged program is compiled without optimization for debugging purposes). That being said, I would be surprised if it were not for the optimizing compiler to produce unexpected results here, so I left the rest of the answer as it is.

I have added some debugging printfs to illustrate what is going on. Here's a live example: http://ideone.com/aL407L .

 #include<stdio.h> #include<string.h> static char* abc = "kj"; // Helper function to print the first bytes a char pointer points to void printBytes(const char *const caption, const char *const ptr) { int i=0; printf("%s: {", caption); for( i=0; i<sizeof(char *)-1; ++i) { printf("0x%x,", ptr[i]); } printf( "0x%x ...}\n", ptr[sizeof(char *)-1] ); } // What exactly does this function do? void fn(char**s) { printf("Inside fn: Argument value is %p\n", s); printBytes("Inside fn: Bytes at address above are", (char *)s); // This throws. *s is not a valid address. // printf("contents: ->%s<-\n", *s); *s = abc; printf("Inside fn: Bytes at address above after assignment\n"); printBytes(" (should be address of \"kj\")", (char *)s); // Now *s holds a valid address (that of "kj"). printf("Inside fn: Printing *s as string (should be kj): ->%s<-\n", *s); } int main() { char str[256]; printf("size of ptr: %zu\n", sizeof(void *)); strcpy(str, "AAAAAAAA"); // 9 defined bytes printf("addr of \"kj\": %p\n", abc); printf("str addr: %p (%p)\n", &str, str); printBytes("str contents before fn", str); printf("------------------------------\n"); // Paramter type does not match! Illegal code // (6.5.16.1 of the latest public draft; incompatible // types for assignment). fn(&str); printf("------------------------------\n"); printBytes("str contents after fn (ie abc -- note byte order!): ", str); printf("str addr after fn -- still the same! --: %p (%p)\n", &str, str); return 0; } 
+4
source

What you are here is just Undefined Behavior.

The function parameter is declared as a pointer to a pointer to a char. The argument passed to it is a pointer to an array-256-char. The standard allows transitions between one pointer and another, but since the object that points to is not a pointer to a char, dereferencing a pointer has the value Undefined Behavior.

n1570 S6.5.3.2 / 4:

If an invalid value is assigned to the pointer, the behavior of the unary * operator is undefined.

It is useless to talk about how Undefined behavior will be reproduced in different implementations. This is simply wrong.


To be clear, UB is on this line:

 *s=abc; 

The s pointer does not point to an object with the correct type ( char* ), so using * is UB.

+2
source

Source: https://habr.com/ru/post/974824/


All Articles