Compiling Java Just-in-Time Bytes

We are currently working on the JIT compilation part of our own implementation of the Java virtual machine. Our idea was now to make a simple translation of this Java bytecode into operation codes, write them to executable memory and CALLing right to the beginning of the method.

Assuming given Java code:

int a = 13372338; int b = 32 * a; return b; 

Now the following approach has been taken (assuming that this memory starts at 0x1000, and the expected value is expected in eax):

 0x1000: first local variable - accessible via [eip - 8] 0x1004: second local variable - accessible via [eip - 4] 0x1008: start of the code - accessible via [eip] Java bytecode | Assembler code (NASM syntax) --------------|------------------------------------------------------------------ | // start | mov edx, eip | push ebx | | // method content ldc | mov eax, 13372338 | push eax istore_0 | pop eax | mov [edx - 8], eax bipush | push 32 iload_0 | mov eax, [edx - 8] | push eax imul | pop ebx | pop eax | mul ebx | push eax istore_1 | pop eax | mov [edx - 4], eax iload_1 | mov eax, [edx - 4] | push eax ireturn | pop eax | | // end | pop ebx | ret 

It will just use the stack in the same way as the virtual machine itself. Questions regarding this decision:

  • Is this compilation method viable?
  • Is it possible to implement all Java instructions in this way? How can things like athrow / instanceof and similar commands be translated?
+6
source share
2 answers

This compilation method works, gets up and works easily, and it at least eliminates the overhead of interpretation. But this leads to quite a lot of code and rather terrible performance. One big problem is that it translates the operations of the 1: 1 stack, although the target computer (x86) is the registrar. As you can see in the published snippet (like any other code), this always leads to several stack manipulation codes for each individual operation, so it uses registers - hell, the whole ISA - as inefficient as possible.

You can also maintain a complex control flow, such as exceptions. This is not very different from its implementation in the interpreter. If you need good performance, you do not want to do the work every time you enter or exit the try block. There are schemes to avoid this, used by both C ++ and other JVMs (keyword: zero-cost or table-based exception handling). They are quite complex and difficult to implement, understand, and debug, so you must first move on to a simpler alternative. Just keep that in mind.

As for the generated code: the first optimization, for which you will almost certainly need, is to convert the stack operations into three address codes or another representation using registers. There are several articles about this and the implementation of this, so I will not stop in detail if you do not want this. Then, of course, you need to map these virtual registers to physical registers. Register allocation is one of the best-studied topics in compiler designs, and there are at least half a dozen heuristics that are efficient enough and fast enough to use in the JIT compiler. One example from the top of my head is the distribution of a linear scan (specifically for compiling JIT).

In addition, most JIT compilers focused on the performance of the generated code (as opposed to fast compilation) use one or more intermediate formats and optimize programs in this form. This is basically your launch of the mill compiler optimization kit, including veterans such as constant distribution, numbering, reassociation, cyclic movement of the loop code, etc. - These things are not only easy to understand and implement, but also described in thirty years of literature up to and including textbooks and Wikipedia.

The code you get with the above will be pretty good for line code using primitives, arrays, and object fields. However, you cannot optimize method calls at all. Each method is virtual, which means that nesting or even moving method calls (for example, from a loop) is practically impossible, except in special cases. You mentioned that this is for the kernel. If you can accept using a subset of Java without loading a dynamic class, you can do better (but it will be non-standard), assuming JIT knows all the classes. Then you can, for example, discover leaf classes (or, more generally, methods that will never be overridden) and embed them.

If you need dynamic class loading, but expect it to be rare, you can also do better, although it requires more work. The advantage is that this approach is generalized to other things, such as the complete exclusion of logging. The main idea specializes in code based on some assumptions (for example, that this static does not change or that new classes do not load), and then de-optimization if these assumptions are violated. This means that you sometimes have to recompile the code during its execution (this is hard , but not impossible).

If you go further down this road, its logical conclusion is to compile a trace-based JIT that has been applied to Java, but AFAIK has not outperformed the method-based JIT compilers. This is more efficient when you need to make dozens or hundreds of assumptions to get good code, as is the case with highly dynamic languages.

+5
source

Some comments about your JIT compiler (hope I don't write things "delnan" already wrote):

General comments

I am sure that the "real" JIT compilers work similarly to yours. However, you can do some optimization (for example: "mov eax, nnn" and "push eax" can be replaced with "push nnn").

You must store local variables on the stack; usually "ebp" is used as a local pointer:

 push ebx push ebp sub esp, 8 // 2 variables with 4 bytes each mov ebp, esp // Now local variables are addressed using [ebp+0] and [ebp+4] ... pop ebp pop ebx ret 

This is necessary because functions can be recursive. Storing a variable in a fixed location (relative to EIP) will cause the variables to behave as β€œstatic”. (I assume that you will not compile the function several times in the case of a recursive function.)

Try / Catch

To implement Try / Catch, your JIT compiler must not only look at Java Bytecode, but also at Try / Catch information, which is stored in a separate attribute in the Java class. Try / catch can be implemented as follows:

  // push all useful registers (= the ones, that must not be destroyed) push eax push ebp ... // push the "catch" pointers push dword ptr catch_pointer push dword ptr catch_stack // set the "catch" pointers mov catch_stack,esp mov dword ptr catch_pointer, my_catch ... // some code // Here some "throw" instruction... push exception jmp dword ptr catch_pointer ... //some code // End of the "try" section: Pop all registers pop dword_ptr catch_stack ... pop eax ... // The "catch" block my_catch: pop ecx // pop the Exception from the stack mov esp, catch_stack // restore the stack // Now restore all registers (same as at the end of the "try" section) pop dword_ptr catch_stack ... pop eax push ecx // push the Exception to the stack 

In a multi-threaded environment, each thread requires its own variable catch_stack and catch_pointer!

Specific exception types can be handled with "instanceof" as follows:

 try { // some code } catch(MyException1 ex) { // code 1 } catch(MyException2 ex) { // code 2 } 

... actually compiled as follows:

 try { // some code } catch(Throwable ex) { if(ex instanceof MyException1) { // code 1 } else if(ex instanceof MyException2) { // code 2 } else throw(ex); // not handled! } 

The objects

The JIT compiler of a simplified Java virtual machine that does not support objects (and arrays) will be quite simple, but objects in Java make the virtual machine very complex.

Objects are simply stored as pointers to an object on the stack or in local variables. Typically, JIT compilers will be implemented as follows: for each class, there is a piece of memory that contains information about the class (for example, what methods exist and with what address the assembler code of the method is located, etc.). An object is a piece of memory containing all the instance variables and a pointer to a memory containing information about the class.

"Instanceof" and "checkcast" can be implemented by looking at a pointer to a memory containing class information. This information may contain a list of all parent classes and implemented interfaces.

The main problem with objects is memory management in Java: unlike C ++, there is a β€œnew”, but no. You should check how often the object is used. If the object is no longer in use, it must be deleted from memory and the destructor must be called.

The problems here are local variables (the same local variable may contain an object or number) and try / catch blocks (the catch block must take care of local variables and the stack (!) Containing objects before restoring the stack pointer).

+2
source

Source: https://habr.com/ru/post/954629/


All Articles