C: alignment of data structures

I work with structures and have a few questions about them. As far as I understand, structural variables will be placed in memory sequentially. The length of the blocks (words) depends on the architecture of the machine (32 bits - 4 bytes, 64 bits - 8 bytes).

Let's say we have 2 data structures:

struct ST1 { char c1; short s; char c2; double d; int i; }; 

In memory, it will be:

 32 bit - 20 bytes 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | ------------------------------------------------------------------------------------------ c1| PB| s | s | c1| PB| PB| PB| d | d | d | d | d | d | d | d | i | i | i | i | 64 bit - 24 bytes | 20 | 21 | 22 | 23 | previous sequence + --------------------- | PB | PB | PB | PB | 

But we can change it so that this data fits into the machine word. Like this:

 struct ST2 { double d; int i; short s; char c1; char c2; }; 

In this case, for 32 and 64 bits it will be represented in the same way (16 bytes):

  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | ---------------------------------------------------------------------- d | d | d | d | d | d | d | d | i | i | i | i | s | s | ch1| ch2| 

I have a couple of questions:

  • This seems like a wild assumption, but the basic rule for a struct is to define variables with a large size at the beginning?
  • As I understand it, it does not work with autonomous variables. Like char str[] = "Hello"; ?
  • Byte to fill, what is its code? Is it somewhere in the ASCII table? Sorry, could not find.
  • 2 structures with all elements represented in memory by different addresses, and they can be placed not sequentially in memory?
  • Such a structure: struct ST3 { char c1; char c2; char c3;} st3; struct ST3 { char c1; char c2; char c3;} st3; Has size = 3 , I understand that if we add a member with a different type to it, it will be aligned. But why is he not aligned before this?
+6
source share
5 answers

The basic rules are simple:

  • members should be there in order (if in C ++ you are not using private: public: ... sections)
  • addition allowed between members after the last

What about that. The rest remains to be implemented: storage by type, fill size. You can usually expect it to be properly documented in the ABI or directly in the compiler, and even have tools for manipulation.

In practice, an addition is required on some architectures, say, SPARC requires 32-bit "ints" aligned to an address divisible by 4. On others, this is not a requirement, but inconsistent entities may take longer to process, say, the 80286 processor executes an additional cycle to read a 16-bit object from an odd address. (Before I forget: the presentation of the types themselves is different!)

Usually the alignment requirement or the best performance matches exactly: you have to align the border in the same way as the size. A good counter example is the number of 80-bit floating point numbers (available as double or long doubles in some compilers) that look like 8 or 16 bytes, not 10.

To play with the compiler, you can usually switch to the default installation. This varies from version to version, so it is better taken into account when upgrading. And an internal code redefinition tool like _attribute__(packed) in gcc and #pragma in the MS package, and many others. All of these are obviously standards.

The bottom line is that if you want to tinker with the layout, you start reading the dox of all the compilers that you aim at, now and in the future, to know what they are doing and how to manage it. You may also want to read the dox of the target platforms, depending on why you are interested in the layout in the first place.

One common motivation is to have a stable layout when you write raw memory to a file and expect to read it. Perhaps another platform uses a different compiler. This is easier until a new type of platform appears in the scene.

Another motivation is productivity. This method is more complicated, because the rules change quickly, and the effect is difficult to predict immediately. Say, for Intel, the basic “inconsistent" punishment lasts a long time, instead it is important that it is inside the cache line. Where the cache line size depends on the processor. In addition, using more add-ons can lead to improvements in the individual, while fully packaged structures are more economical to use cache.

And some operations require proper alignment, but are not directly applied by the compiler, you may need to use special alignment pragmas (for example, for certain SSEs ).

The bottom line is repeated: stop guessing, solve your goals and read the correct dox. (By the way, for me I read architecture manuals for SPARC , IA32 , while others were amazing and winning in many ways.)

+3
source

Answering your questions as posed (ignoring your very beautiful picture of the structure)

This sounds like a wild assumption, but the main rule for struct is to define variables with a large size at the beginning?

Always place material that requires the greatest leveling. For example, I would not put char[99] . In general, it works like pointers, 64-bit native types, 32-bit native types, etc., but you have to be very careful if your structure contains elements that are different structures.

As I understand it, it does not work with autonomous variables. Like char str[] = "Hello";

I really don't get it. If you define a char array on the stack, it has char alignment. If you define a char array followed by an int, there will probably be an addition to the stack, you simply cannot find it.

Byte to fill, what is its code? Is it somewhere in the ASCII table? Sorry, could not find it.

It has neither code nor data. This is a compiler addition and may contain any value that may or may not be different between different instances of the structure in the same or different program runs.

2 structures with all elements represented in memory by different addresses, and they can be placed not sequentially in memory?

I do not understand this. You ask if the compiler can insert a pad between structures? If not, please clarify, because this answer will not be very useful;

When the compiler creates a structure, it should give you the opportunity to create an array of such structures. Consider this:

 struct S { int wibble; char wobble; }; S stuff[2]; 

If the compiler does not insert 3 bytes of padding after the wobble, access to stuff[1].wobble will not be aligned correctly, resulting in crashes on some hardware (and brutal performance on other hardware). Basically, the compiler must provide padding at the end to ensure that the most aligned structure element is always correctly aligned for an array of such structures.

Such a structure: struct ST3 { char c1; char c2; char c3;} st3; struct ST3 { char c1; char c2; char c3;} st3; It has size = 3, I understand that if we add a member with a different type to it, it will be aligned. But why is he not aligned before him?

You mean, "Why doesn't the compiler put it in the place where it is correctly aligned"? Because language does not allow. The compiler is not allowed to reorder the members of your structure. Allowed to embed add-ons.

0
source

Aligning a member of structures (and classes) depends on the platform, the truth, but also on the compiler. The reason for aligning the elements in size is due to performance considerations. Creating an integer type that matches its size reduces memory access.

You can usually force the compiler to reduce alignment, but this is not a good idea, except for special reasons (for example, for compatibility between data between different platforms as communication data). In Visual C ++, there is #pragma pack for this, for example:

 #pragma pack(1) struct ST1 { char c1; short s; char c2; double d; int i; }; assert(sizeof(ST1) == 16); 

But, as I said, this is usually not a good idea.

Remember that the compiler does not just add bytes of an element after some fields. It also ensures that the structure is allocated in memory for all fields aligned to the right. I mean, in your example ST1, since the larger field type is double, the compiler will be sure that the d field will be aligned to 8 bytes (except when using #pragma pack or similar parameters):

 ST1 st1; assert(&st1.d % 8 == 0); 

About your questions:

  • If you want to save space, yes, these are good fields of the order of tricks in size, first recording more. For composite structures, use the size of the larger field of the internal structure, not the size of the structure.
  • It works with autonomous variables. But the compiler can arrange variables in memory (as opposed to members of structures and classes).

For instance:

 short s[27]; int32_t i32[34]; int64_t i64[45]; assert(s % 2 == 0); assert(i32 % 4 == 0); assert(i64 % 8 == 0); 
  • Fill bytes can contain anything. Usually initialized data (at least you initialize it). Several times may contain a specific byte pattern by the compiler to explain the reasons.
  • About structures with all members represented in memory by different addresses: sorry, I do not understand what you are asking.
  • Standard C ++ says that the address of the structure / class must be the same address of the first field of such a structure / class. Then only adding after c3 possible, but never before c1 .

From N3337 (C ++ 11) [9.2 class.menu, p. 20]:

A pointer to a standard layout structure object, converted accordingly using a reinterpret_cast , points to its initial member (or if this member is a bit field, then to the block in which it is located) and vice versa. [Note. Thus, in a standard structural object, but not at its beginning, as necessary to achieve appropriate alignment. -end note]

0
source

the gcc on intel architecture requires more instructions and cycles to access an odd memory number (read / write). therefore the addition is added to the reachable memory address with an even number

0
source

Be careful, you are not sure that your variables are aligned (but this often happens). If you use GCC, you can use the packed attribute to make sure your data is aligned.

Example:

 struct foo { char c; int x; } __attribute__((packed)); 

As I understand it, it does not work with autonomous variables. How charstr [] = "Hello" ;?

This table will be aligned in your memory.

-1
source

Source: https://habr.com/ru/post/946758/


All Articles