How exactly does java compile?

Question

How exactly does java compile?

Confuses java compilation process

OK I know this: we write the java source code, the compiler, which is platform independent, translates it into bytecode, then jvm, which is platform dependent, translates it into machine code.

So, from the beginning, we are writing java source code. The javac.exe compiler is a .exe file. What is this .exe file? Is the java compiler written in java, then how does the .exe file that executes it happen? If the compiler code is written by java, then the compiler code is executed at the compilation stage, since its work is jvm to execute Java code. How can a language make its own language code? All this looks like a chicken and egg problem.

Now, what exactly does the .class file contain? This is an abstract syntax tree in text form, this is table information, what is it?

can someone tell me a clear and detailed way on how my java source code is converted to machine code.

+59

java compiler-construction jvm

nash Aug 04 '10 at 15:12

source share

9 answers

Is the java compiler written in java, then how does the .exe file that executes it happen?

Where do you get this information from? The javac executable can be written in any programming language, it does not matter, all that matters is the executable, which turns .java files into .class files.

You can find more details about the binary specification of the .class file in these chapters in the Java Language Specification (although perhaps a bit technical):

You can also see the specification of the virtual machine , which covers:

+16

matt b Aug 04 '10 at 15:17

source share

The javac.exe compiler is a .exe file. What is this .exe file? not a java compiler written in java, how do you get a .exe file that executes it?

The Java compiler (at least the one that comes with the Sun / Oracle JDK) is actually written in Java. javac.exe is just a launcher that processes command line arguments, some of which are passed to the JVM, which launches the compiler, and others to the compiler itself.

If the compiler code is written by java, then the compiler code is executed at the compilation stage, since its work jvm execute java code. How can a language compile its own language code? It all looks like a chicken and egg problem for me.

Many (if not most) compilers are written in the language they compile. Obviously, at an early stage, the compiler should have been compiled by someone else, but after this “bootstrap” any new version of the compiler can be compiled with an older version.

Now, what exactly does the .class file contain? This is an abstract syntax tree in text form, this is table information, what is it?

Details of the class file format are described in the Java Virtual Machine Specification .

+11

Michael Borgwardt Aug 04 2018-10-10T00:

source share

Well, javac and jvm are usually native binaries. They are written in C or whatever. Of course, you can write them in Java, just first you need a native version. This is called trunk lashing.

An interesting fact: most compilers that compile into native code are written in their own language. However, they all first needed to write a native version in another language (usually C). The first C compiler, for comparison, was written in Assembler. I assume that the first assembler was written in machine code. (Or using butterflies ;)

.Class files are bytecode generated by javac. They are not textual, they are binary code, similar to machine code (but with a different set of commands and architecture).

At runtime, jvm has two options: it can interpret byte code (pretending to be the processor itself) or JIT (just in time) compile it into machine code. The latter, of course, is faster, but more complicated.

+5

Mike Caron Aug 04 '10 at 15:19

source share

The .class file contains bytecode, which is very similar to a high-level build . The compiler may be well written in Java, but the JVM will need to be compiled into its own code to avoid the chicken / egg problem. I believe that it is written in C, as well as at lower levels of standard libraries. When the JVM is running, it compiles "exactly at the point in time" to turn this bytecode into its own instructions.

+3

ZoFreX Aug 04 '10 at 15:16

source share

Brief explanation

Write the code in a text editor, save it in a format that the compiler understands - ". Java file extension" , javac (java compiler) converts this value to ". Class" (byte file - class file). The JVM executes the .class file on the operating system in which it is located.

Long explanation

Always remember that java is not the base language that the operating system recognizes. Java source code is interpreted by an operating system translator called the Java Virtual Machine (JVM) . The JVM cannot understand the code that you write in the editor; it needs compiled code. Here the compiler enters the picture.

Each computer process allows you to manipulate memory. We cannot just write code in a text editor and compile it. We need to put it in the computer’s memory and then save it before compilation.

How does javac (java compiler) recognize saved text as compiled? . We have a separate text format that the compiler recognizes, i.e. .java . Save the file in the .java extension, and the compiler will recognize it and compile it when asked.

What happens when compiling? - The compiler is the second translator (not a technical term) involved in the process; it translates a user-friendly language (java) into a comprehensible JVM language (bytecode - .class format).

What happens after compilation? - The compiler creates a .class file that the JVM understands. Then the program is executed, that is, the .class file is executed by the JVM on the operating system.

Facts You Should Know

1) Java is not multi-platform platform independent .

2) JVM is developed using C / C ++ . One of the reasons people call Java a slower language than C / C ++

3) Java bytecode (.class) is located in the "Assembly Language" , the only language understood by the JVM. Any code that creates a .class file at compilation or generated bytecode can be run on the JVM.

+2

Arvind Purushotham May 13 '16 at 19:06

source share

Windows does not know how to call Java programs before installing the Java runtime, and Sun has chosen its own commands that collect arguments and then invoke the JVM instead of binding jar-suffix to the Java engine.

+1

Thorbjørn Ravn Andersen Aug 04 '10 at 15:39

source share

The compiler was originally written in C with the C ++ bits, and I assume that it is still (why do you think the compiler is also written in Java?). javac.exe is just the C / C ++ code that is the compiler.

As a side item, you can write a compiler in java, but you're right, you need to avoid the chicken and egg problems. To do this, you usually write one or more bootstrap tools in something like C to compile the compiler.

The .class file contains bytecodes, the output from the javac compilation process, and these are instructions that tell the JVM what to do. At run time, these bytecodes were translated into the processor's own instructions (machine code), so they can be run on specific equipment under the JVM.

To complicate this, the JVM also optimizes and caches machine code generated by byte codes to avoid re-translations. This is known as JIT compilation and occurs as the program starts, and the byte codes are interpreted.

-one

Paolo Aug 04 '10 at 15:27

source share

.java file
compiler (JAVA BUILD)
.class (bytecode)
JVM (system software typically created using 'C')
WORKING PLATFORM
CPU

-four

user3684728 May 28 '14 at 17:38

source share

Rekin · Accepted Answer · 2010-08-04 15:38

Well, I know this: we are writing Java source code, a compiler that is platform independent translates it into bytecode,

In fact, the compiler itself works as its own executable file (hence javac.exe). Indeed, it converts the source file into bytecode. The bytecode is platform independent as it targets the Java virtual machine.

then jvm, which is platform dependent, translates it into machine code.

Not always. As for the Sun JVM, there are two jvms here: client and server. Both of them can, but need not be compiled into native code.

So, from the very beginning we are writing the Java source code. The javac.exe compiler is a .exe file. What exactly is this .exe file? Isn't the Java compiler written in Java, then how do you get a .exe file that executes it?

This exe is packed Java bytecode. This is for convenience - to avoid complex batch scripts. It starts the JVM and starts the compiler.

If the compiler code is written in java, then why is the compiler code executed at the compilation stage, since it is the task of jvm to execute java code.

This is exactly what the packaging code does.

How can a language compile its own language code? All this seems to me a problem with chicken and egg.

True, it is confusing at first glance. Although, this is not only a Java idiom. Ada's compiler is also written on Ada itself. This may look like a “chicken and egg problem,” but actually it’s just a problem with self-loading.

Now, what exactly does the .class file contain? This is an abstract syntax tree in text form, tabular information, what is it?

This is not an Abstract Syntax Tree. AST is used by the tokenizer and compiler only at compile time to represent code in memory. The .class file is similar to the assembly, but for the JVM. The JVM, in turn, is an abstract machine that can run in a specialized machine language that focuses only on a virtual machine. In the simplest case, the .class file has a structure very similar to a regular assembly. First, all static variables are declared, then several external function signature tables go, and finally, machine code.

If you are really curious, you can dig into the class file using the "javap" utility. Here is an example of the (confusing) result of calling javap -c Main :

 0: new #2; //class SomeObject 3: dup 4: invokespecial #3; //Method SomeObject."<init>":()V 7: astore_1 8: aload_1 9: invokevirtual #4; //Method SomeObject.doSomething:()V 12: return

So, you should already have an idea of what it really is.

Can anyone tell me how my Java source code is converted to machine code?

I think this should be more clear right now, but here's a brief summary:

You call javac , pointing to your source code file. The javac internal reader (or tokenizer) reads your file and creates a real AST from it. All syntax errors occur from this stage.
javac has not finished his work. When he has an AST, a true compilation can begin. It uses a visitor pattern to bypass AST and resolves external dependencies to add meaning (semantics) to the code. The finished product is saved as a .class file containing bytecode.
Now it's time to run the thing. You call java with a .class file name. Now the JVM starts again, but to interpret your code. The JVM may or may not compile your abstract bytecode into its own assembly. The Sun HotSpot compiler, combined with Just In Time compilation, can do this if necessary. The executable code is constantly profiled by the JVM and recompiled into native code if certain rules are followed. Most often, hot code is compiled first.

Change: Without javac you would have to call the compiler using something similar to this:

 %JDK_HOME%/bin/java.exe -cp:myclasspath com.sun.tools.javac.Main fileToCompile

As you can see, it calls the Sun private API, so it is associated with the Sun JDK implementation. This would make the build systems dependent on it. If someone switched to any other JDK (wiki lists 5 except Sun), the above code should be updated to reflect this change (since it is unlikely that the compiler will be in the com.sun.tools.javac package). Other compilers can be written in native code.

So the standard way is to supply the javac shell with the JDK.

How exactly does java compile?

More articles: