Int16 - byte capacity on the network?

Question

Int16 - byte capacity on the network?

Why:

short a=0; Console.Write(Marshal.SizeOf(a));

shows 2

But if I see the IL code, I see:

 /*1*/ IL_0000: ldc.i4.0 /*2*/ IL_0001: stloc.0 /*3*/ IL_0002: ldloc.0 /*4*/ IL_0003: box System.Int16 /*5*/ IL_0008: call System.Runtime.InteropServices.Marshal.SizeOf /*6*/ IL_000D: call System.Console.Write

LDC on line # 1 indicates:

Push 0 onto the stack as int32 .

Thus, 4 bytes should be occupied.

But sizeOf shows 2 bytes ...

What am I missing here? how many bytes does it really take in mem?

I have heard about situations where there are indents of up to 4 bytes, so it would be faster to handle. is it also here?

(please ignore syncRoot and byte the GC root flag, I'm just asking about 2 vs 4)

+6

c # .net byte short

Royi namir Jul 07 '13 at 11:28

source share

4 answers

The CLI specification details the types of data that are allowed on the stack. A short 16-bit integer is not one of them, so these types of integers are converted to 32-bit integers (4 bytes) when they are loaded onto the stack.

Section III.1.1 contains all the details:

1.1 Data Types
While CTS defines a rich-type system, and CLS indicates a subset that can be used for the Interaction language, the CLI itself deals with a much simpler set of types. These types include user-defined value types and a subset of the built-in types. A subset collectively referred to as the "base CLI types" contains the following types:
A subset of the complete numeric types ( int32 , int64 , native int and F ).
Object references ( O ) without distinction between the type of object references.
The types of pointers ( native unsigned int and & ) are not differentiated by the type they are pointing to.
Note that object references and pointer types can be set to null . This is defined in the CLI to zero (bit-bit for all bits-zero).
1.1.1 Numeric Data Types
The CLI only works with the numeric types int32 (4-byte integers), int64 (8-byte significant integers), native int (integers of natural size) and F (floating-point numbers of their own size). However, the CIL instruction set allows you to implement additional data types:
Short integers: only 4- or 8-byte integers are stored in the evaluation stack, but in other places (arguments, local variables, statics, array elements, fields) can contain 1- or 2-byte integers. For the purpose of stack operations: types bool and char are treated as unsigned 1-byte and 2-byte integers, respectively. Loading from these places on the stack converts them to 4-byte values:
null extension for unsigned int8, unsigned int16, bool and char types;
extension for types int8 and int16;
zero-extends for unsigned indirect and element loads ( ldind.u* , ldelem.u* , etc.) ;; and
sign-extends for signed indirect and element loads ( ldind.i* , ldelem.i* , etc.)
Saving integers, Boolean characters, and characters ( stloc , stfld , stind.i1 , stelem.i2 , etc.) truncates. Use the conv.ovf.* to determine when this truncation results in a value that incorrectly reflects the original value.
[Note. Short (i.e. 1- and 2-byte) integers are loaded as 4-byte numbers on all architectures, and these 4-byte numbers are always tracked as opposed to 8-byte numbers. This helps porting the code, ensuring that arithmetic behavior by default (that is, when the conv or conv.ovf ) has the same results in all implementations.]
Converting instructions that give short integer values actually leaves the int32 (32-bit) value on the stack, but it is guaranteed that the values have only the least significant bits (i.e., more significant bits are zero for unsigned conversions or a sign extension for signed conversions) . To correctly simulate a complete set of short integer operations, conversion to a short integer up to div , rem , shr , comparison, and conditional branch instructions are required.

& hellip etc.

Speaking speculatively, this decision was probably made either for simplicity of architecture or for speed (or, possibly, for both). Modern 32-bit and 64-bit processors can work more efficiently with 32-bit integers than with 16-bit integers, and since all integers that can be represented in 2 bytes can also be represented in 4 bytes, it is reasonable.

The only time it would be wise to use a 2 byte integer rather than 4 bytes - if you care more about memory usage than about speed / efficiency. And in this case, you will need a whole bunch of these values, probably packed into a structure. And then you will need the result of Marshal.SizeOf .

+7

Cody gray Jul 07 '13 at 11:39

source share

The C # language specification defines how the program should behave. He does not say how to implement this as long as the behavior is correct. If you set the size to short , you always get 2 .

In practice, C # is compiled into CIL, where integral types smaller than 32 bits are represented as 32-bit integers on stack ¹ .

Then JITer redirects it again to everything that is suitable for the target equipment, usually a piece of memory on the stack or a register.

As long as none of these transformations alters the observed behavior , they are legal.

In practice, the size of local variables is largely irrelevant; the size of arrays matters. An array of a million short usually takes 2 MB.

¹ is a virtual stack on which IL runs, which differs from the stack by machine code operations.

+1

CodesInChaos Jul 07 '13 at 11:45

source share

The CLR only works with 32-bit and 64-bit integers on the stack. The answer lies in this instruction:

 box System.Int16

This means that the value type is marked as Int16. The C # compiler automatically emits this box to call Marshal.SizeOf (object), which in turn calls GetType () in the field value that typeof (System.Int16) returns.

+1

Daniel balas Jul 07 '13 at 11:47

source share

Hans passant · Accepted Answer · 2013-07-07T12:41:28+0000

It's pretty easy to understand what happens when you look at the available LDC instructions . Note the limited range of operand types available, there is a no version that loads a constant of type short. Just int, long, float and double. These restrictions are visible elsewhere, for example, the Opcodes.Add command is also limited, adding variables of one of the smaller types is not supported.

The IL instruction set has been specifically designed in such a way that it reflects the capabilities of a simple 32-bit processor. The processor to think about is the RISC type, they had their own mowing in the nineties. There are many 32-bit processor registers that can only work with 32-bit integers and IEEE-754 floating-point types. The Intel x86 core is not a good example, although it is used very often, it is a CISC design that actually supports loading and performing arithmetic for 8-bit and 16-bit operands. But this is more of a historical accident, it facilitated the mechanical translation of programs that started on 8-bit 8080 and 16-bit 8086 processors. But this feature does not come for free, manipulating 16-bit values actually costs an additional processor cycle.

Making IL a good combination with the capabilities of a 32-bit processor clearly makes the job of a guy doing jitter much easier. Storage locations can still be smaller, but only loads, stores, and conversions need to be supported. And only when necessary, your variable "a" is a local variable, which in any case occupies 32 bits in the stack frame or in cpu. Only stores in memory should be truncated to the desired size.

There is no ambiguity in the code snippet. The value of the variable must be inserted into the box, because Marshal.SizeOf () takes an argument of an object of type. The boxed value defines the type of the value with a type descriptor; it points to System.Int16. Marshal.SizeOf () has built-in knowledge to know that it takes 2 bytes.

These restrictions reflect the C # language and cause inconsistency. This type of compilation error forever refutes and annoys C # programmers:

  byte b1 = 127; b1 += 1; // no error b1 = b1 + 1; // error CS0266

As a result of IL constraints, there is no add statement that accepts byte operands. They must be converted to the next larger compatible type, int in this case. Thus, it runs on a 32-bit RISC processor. Now the problem is, the 32-bit int result should be hammered back into the variable, which can only store 8 bits. C # language applies what the hammer itself is for the first purpose, but illogically requires a hammer in the second purpose.

Int16 - byte capacity on the network?

More articles: