How to unload the calculation of the memory offset from the runtime in C / C ++?

I am implementing a simple virtual machine, and currently I am using runtime arithmetic to calculate the addresses of individual program objects as offsets from base pointers.

Today I asked a couple of questions on this issue, but it seems that I did not go anywhere.

I learned some of you, starting with the first question - Calculating access to an object and structure and calculating address offsets - I learned that modern processors have virtual addressing capabilities, allowing you to calculate memory offsets without any additional arithmetic cycles.

And from the second question - Are address offsets allowed during compilation in C / C ++? - I found out that there is no guarantee for this when you perform manual offsets.

By now, it should be clear that I want to achieve the benefits of using virtual memory functions for hardware and disconnecting from runtime.

I use GCC because for the platform - I am developing on x86 on Windows, but since it is a virtual machine, I would like it to work effectively on all platforms supported by GCC.

Therefore, ANY information on this subject is welcome and will be greatly appreciated.

Thanks in advance!

EDIT: some overview of my program code generation - at the design stage, the program is built as a tree hierarchy, which is then recursively serialized into one continuous memory block, and also indexes the objects and calculates their offset from the beginning of the program memory block.

EDIT 2: Here is some VM pseudo code:

switch *instruction case 1: call_fn1(*(instruction+1)); instruction += (1+sizeof(parameter1)); break; case 2: call_fn2(*(instruction+1), *(instruction+1+sizeof(parameter1)); instruction += (1+sizeof(parameter1)+sizeof(parameter2); break; case 3: instruction += *(instruction+1); break; 

Case 1 is a function that takes one parameter, which is detected immediately after the instruction, so it is passed as an offset of 1 byte from the instruction. The instruction pointer is incremented by 1 + the size of the first parameter to find the next command.

Case 2 is a function that takes two parameters, as before, the first parameter is passed as 1 byte offset, the second parameter is passed as 1 byte offset plus the size of the first parameter. Then the instruction pointer is increased by the size of the command plus the sizes of both parameters.

Case 3 - the goto statement, the instruction pointer is incremented by the offset immediately following the goto statement.

EDIT 3: As I understand it, the OS will provide each process with its own dedicated space for addressing virtual memory. If so, does this mean that the first address is always ... zero zero, so the offset from the first byte of the memory block is actually the address of this element? If the memory address is dedicated to each process, and I know the block offset of my program memory AND the offset of each program object from the first byte of the memory block, are the addresses of the objects resolved at compile time?

The problem is that these offsets are not available during compilation of C code, they become known during the phase of "compilation" and translation into byte code. Does this mean that there is no way to make the calculation of the memory address of the object "free"?

How is this done in Java, for example, when only machine compilation is only for machine code, does this mean that calculating the addresses of objects leads to performance degradation due to runtime arithmetic?

+6
source share
4 answers

Here's an attempt to shed light on how related questions and answers relate to this situation.

Two different things are mixed in the answer to the first question: the first is the addressing modes in the X86 instruction, and the second is the mapping of the virtual to the physical. The first is what the compilers do, and the second is what (usually) is installed by the operating system. In your case, you should only worry about the first.

The instructions in the X86 assembly have great flexibility in accessing the memory address. Instructions that read or write memory have an address calculated using the following formula:

 segment + base + index * size + offset 

The segmented part of the address is almost always the default DS segment and can usually be ignored. The base part is specified by one of the general registers or the stack pointer. The index part is specified by one of the general registers, and the size is 1, 2, 4, or 8. Finally, the offset is a constant value built into the instruction. Each of these components is optional, but obviously at least one must be specified.

This addressing feature is what is usually implied when it comes to computing addresses without explicit arithmetic instructions. There is a special instruction that was mentioned by one of the commentators: LEA , which performs the calculation of the address, but instead of reading or writing to the memory, stores the calculated address in the register.

For the code that you included in the question, it is likely that the compiler will use these addressing modes to avoid explicit arithmetic instructions.

As an example, the current value of the instruction variable can be stored in the ESI register. In addition, each of sizeof(parameter1) and sizeof(parameter2) are compile time constants. In standard X86 conventions, function arguments are set up in reverse order (so the first argument is on the top of the stack), so assembler codes might look something like this:

 case1: PUSH [ESI+1] CALL fn1 ADD ESP,4 ; drop arguments from stack ADD ESI,5 JMP end_switch case2: PUSH [ESI+5] PUSH [ESI+1] CALL fn2 ADD ESP,8 ; drop arguments from stack ADD ESI,9 JMP end_swtich case3: MOV ESI,[ESI+1] JMP end_switch end_switch: 

it is assumed that the size of both parameters is 4 bytes. Of course, the actual code depends on the compiler, and it is reasonable to expect that the compiler will produce sufficiently efficient code if you ask for level optimization.

+2
source

The VM has a data element X , a relative address A and an instruction that says (for example) push X , right? And you want to be able to follow this instruction without adding A to the base address of the VM data area.

I wrote a virtual machine that solves this problem by comparing the VM data area with a fixed virtual address. The compiler knows this virtual address and therefore can configure A at compile time. Will this decision work for you? Can you change your compiler yourself?

My virtual machine runs on a smart card, and I have full control over the OS, so it is different from yours. But Windows has some options for allocating memory at a fixed address - for example, the VirtualAlloc function . You can try this. If you try this, you may find that Windows highlights the regions that are facing your data area with a fixed address, so you may have to manually load any DLL files that you use after you select the VM data area.

But there are likely to be unforeseen problems, and this may not be worth the trouble.

+1
source

A game with virtual address translation, page tables, or TLBs is something that can only be done at the kernel level of the OS and is not supported between platforms and processor families. In addition, hardware address transfers on most ISA processors are usually only supported at a certain page size level.

0
source

To answer my own question, based on the many answers I received.

It turns out that in my situation it’s not entirely possible, getting free calculations of memory addresses is possible only when specific requirements are met and require compilation for machine specific instructions.

I am developing a visual element, a lego-style drag-drop programming system for educational purposes, which relies on a simple virtual machine to execute program code. I was hoping to maximize performance, but this is simply not possible in my scenario. This is not so important, because program elements can also generate their equivalent C code, which can then be compiled traditionally to maximize performance.

Thanks to everyone who answered and clarified the question, which I do not quite understand!

0
source

Source: https://habr.com/ru/post/920532/


All Articles