In c, what are the rules governing how compilers concatenate the same lines into an executable?

I am trying to find which rules for c and C ++ compilers put strings in the data section of executable files and do not know where to look. I would like to know if it is guaranteed that the address of all of the following values ​​will be specified in the c / C ++ specification:

char * test1 = "hello"; const char * test2 = "hello"; static char * test3 = "hello"; static const char * test4 = "hello"; extern const char * test5; // Defined in another compilation unit as "hello" extern const char * test6; // Defined in another shared object as "hello" 

Testing on windows, they are all the same. However, I do not know if they will be on all operating systems.

+4
source share
4 answers

I would like to know if it is guaranteed that the address of all of the following in c / C ++ is by specification

String literals can be the same object, but are not required.

C ++ says:

(C ++ 11, 2.14.5p12) "Regardless of whether all string literals are different (that is, stored in objects with non-overlapping objects), it is determined by the implementation. The effect of trying to change the string literal is undefined."

C says:

(C11, 6.5.2.5p7) "String literals and compound literals with standard types should not denote separate objects .101). This allows implementations to share storage for string literals and constant compound literals with identical or overlapping representations."

And C99 Rationale reads:

"This specification allows implementations to exchange copies of strings with the same text, put string literals in read-only memory, and perform certain optimizations."

+8
source

Firstly, it has nothing to do with the operating system. It depends entirely on the implementation, that is, on the compiler.

Secondly, the only "guarantees" you can hope for in this case will be obtained from the compiler documentation. Formal language rules do not guarantee that they will be the same, and do not guarantee that they will be different. (The latter applies to both C and C ++.)

Thirdly, some compilers have such strange options as "make string literals modifiable". This usually implies that each literal is allocated in a unique storage area and has a unique address.

+1
source

All of them can be the same. Even x and y in the following may be the same. z may overlap with y

 const char *x = "hello"; const char *y = "hello\0folks"; const char *z = "folks"; 
0
source

In C, I believe that the only guarantee for a string literal is that it will evaluate to a pointer to a readable region of memory, which, assuming the program is not involved in Undefined Behavior, always contains the specified characters with a null byte. The compiler and linker are allowed to work together in any mode convenient for them to make this happen. Although I don’t know of any compiler / linker systems that do this, it would be completely legal if the compiler put each string literal in its own constant section, and for the linker to place such sections in the reverse order of length and before how to place them, check if the corresponding sequence of bytes has been placed. Note that a sequence of bytes does not even have to be a string literal or a specific constant; if the linker tries to place the string "Hi!" , and he notices that the machine code contains a sequence of bytes [0x48, 0x69, 0x21, 0x00], a literal can evaluate a pointer to the first one.

Note that writing to the memory pointed to by a string literal is an Undefined Behavior. In a different system, writing can be a trap, do nothing or affect only a literal written, but can also have completely unpredictable consequences (for example, if a literal is evaluated by a pointer to some machine code].

0
source

Source: https://habr.com/ru/post/1488732/


All Articles