IL and arguments

IL has some operation codes for working with arguments, such as Ldarg.0 , Ldarg.1 , etc.

I know that these arguments are Ldarg.0 stack before the call code is executed, in some cases Ldarg.0 used to get a reference to this (e.g. members)

My question is: where are these arguments stored when initiating the call? Is a copy of the call stack available from the completed call?

Where can I find more information about this topic?

Update

I know that the virtual machine is abstract and that the JIT compiler will take care of these problems, but imagine if IL was interpreted as on the .NET Micro Framework

+6
source share
3 answers

MSIL works with the virtual machine specification. The mental model of the arguments passed to the method consists of their presence in the array. Where Ldarg selects an element from this array to access the method argument and places it on the evaluation stack. Opcodes.Ldarg_0 is an abridged version of the more general Opcodes.Ldarg IL command, it saves two bytes, always selecting 0. The same idea for Opcodes.Ldarg_1 for the second argument. Very common, of course, Ldarg only gets "expensive" when a method has more than 4 arguments. The emphasis on double quotes is not the kind of expense you've ever worried about.

The argument store actual at run time is very different. It depends on the jitter used, different architectures use different methods of passing arguments. In general, the first few arguments are passed through the processor registers, and the rest through the processor stack. Processors such as x64 or ARM have many registers, so pass more arguments using case than x86. According to __ clrcall rules, the calling convention for this architecture.

+7
source

The ECMA-335 is probably a good starting point for this.

For example, in section I.12.4.1 there is the following:

The instructions emitted by the CIL code generator contain sufficient information for various CLI implementations to use different conventions based on a regular call. All method calls initialize the state areas of the method (see ยงI.12.3.2) as follows:

  • The argument of the incoming arguments is set by the caller for the desired values.
  • An array of local variables always has a null value for object types and for fields inside value types containing objects. In addition, if flag localsinit is specified in the method header, then the local variables array are initialized to 0 for all integer types and for 0.0 for all floating-point types. Value types are not initialized by the CLI, but the tested code will provide a call to the initializer as part of the code point input methods.
  • The accounting stack is empty.

and I.12.3.2 has:

Part of each state of the method is an array that contains local variables and an array containing arguments. Like the evaluation stack, each element of these arrays can contain any single data type or value type instance. Both arrays start at 0 (i.e., the first argument or local variable is numbered 0). The address of a local variable can be calculated using the ldloca statement and the address of the argument using the ldarga statement.

Associated with each method is metadata that indicates:

  • whether the local variables and memory of the memory pool will be initialized when the method is entered.
  • the type of each argument and the length of the array of arguments (but see below for lists of variable arguments).
  • the type of each local variable and the length of the local variable array.

The CLI inserts an add-on suitable for the target architecture. That is, on some 64-bit architectures, all local variables can be aligned on 64 bits, and on others - 8, 16 or 32 bits. The CIL generator should not make any assumptions about the offsets of local variables in the array. In fact, the CLI can freely change the order of elements in a local array of variables, and different implementations can choose them differently.

And then in section III the description for callvirt (as an example) has:

callvirt pops the object and arguments from the evaluation stack before calling the method. If the method has a return value, it will be pushed onto the stack after the method finishes. On the called user side, the obj parameter is opened as argument 0, arg1 as argument 1, etc.

Now it's all at the specification level. The actual implementation may decide, just make a function call inherit the top n elements of the current method stack, which means that the arguments are already in the right place.

+5
source

IL (now known as CIL, Common Intermediate Language, not MSIL) describes operations on an imaginary stack machine. The JIT compiler accepts IL instructions and compiles it into machine code.

When calling a method, the JIT compiler must adhere to the calling convention. This convention indicates how arguments are passed to the called method, how the return value is passed back to the caller, and who is responsible for removing the arguments from the stack (caller or callee). In this example, I use the cdecl convention , but the actual JIT compilers use different conventions.

General approach

The exact details are implementation dependent, but the general approach used by the .NET and Mon JIT compilers to compile CIL for machine code is as follows:

  • "Simulate" the stack and use it to turn all stack-based operations into operations with virtual registers (variables). There is a theoretical infinite number of virtual registers.
  • Turn all IL instructions into equivalent machine instructions.
  • Assign each virtual register to a real machine register. There is only a limited number of machine registers available. For example, the x86 32-bit architecture has only 8 machine registers.

Of course, a lot of optimization happens between these steps.

Example

Here is an example to explain the following steps:

 ldarg.1 // Load argument 1 on the stack ldarg.3 // Load argument 3 on the stack add // Pop value2 and value1, and push (value1 + value2) call int32 MyMethod(int32) // Pop value and call MyMethod, push result ret // Pop value and return 

In step 1, the IL turns into register-based operation dest <- src1, src2 ( operation dest <- src1, src2 ):

 ldarg.1 %reg0 <- // Load argument 1 in %reg0 ldarg.3 %reg1 <- // Load argument 3 in %reg1 add %reg0 <- %reg0, %reg1 // %reg0 = (%reg0 + %reg1) // Call MyMethod(%reg0), store result in %reg0 call int32 MyMethod(int32) %reg0 <- %reg0 ret <- %reg0 // Return %reg0 

Then it turns into machine instructions, for example. x86:

 mov %reg0, [addr_of_arg1] // Move argument 1 in %reg0 mov %reg1, [addr_of_arg3] // Move argument 3 in %reg1 add %reg0, %reg1 // Add %reg1 to %reg0 push %reg0 // Push %reg0 on the real stack call [addr_of_MyMethod] // Call the method add esp, 4 mov %reg0, eax // Move the return value into %reg0 mov eax, %reg0 // Move %reg0 into the return value register EAX ret // Return 

Then each virtual register% reg0,% reg1 is assigned a machine register. For instance:

 mov eax, [addr_of_arg1] // Move argument 1 in EAX mov ecx, [addr_of_arg3] // Move argument 3 in ECX add eax, ecx // Add ECX to EAX push eax // Push EAX on the real stack call [addr_of_MyMethod] // Call the method add esp, 4 mov ecx, eax // Move the return value into ECX mov eax, ecx // Move ECX into the return value register EAX ret // Return 

Spill

When choosing registers carefully, some mov instructions can be eliminated. When at any point in the code more virtual registers are used than machine registers, it is necessary to use one machine register to use. When the machine register spills, instructions are entered that push the value of the register into the real stack. Later, when the spilled value is to be used again, instructions are inserted that expose the register value from the real stack.

Conclusion

As you can see, machine code does not use the real stack almost as often as the IL code used by the evaluation stack. The reason is that machine registers are the fastest processor memory elements, so the compiler tries to use them as best as possible. The value is stored only in the real stack if there is a shortage in machine registers or when a value in the stack is required (for example, due to a calling agreement).

+5
source

Source: https://habr.com/ru/post/944369/


All Articles