Well, I know this: we are writing Java source code, a compiler that is platform independent translates it into bytecode,
In fact, the compiler itself works as its own executable file (hence javac.exe). Indeed, it converts the source file into bytecode. The bytecode is platform independent as it targets the Java virtual machine.
then jvm, which is platform dependent, translates it into machine code.
Not always. As for the Sun JVM, there are two jvms here: client and server. Both of them can, but need not be compiled into native code.
So, from the very beginning we are writing the Java source code. The javac.exe compiler is a .exe file. What exactly is this .exe file? Isn't the Java compiler written in Java, then how do you get a .exe file that executes it?
This exe is packed Java bytecode. This is for convenience - to avoid complex batch scripts. It starts the JVM and starts the compiler.
If the compiler code is written in java, then why is the compiler code executed at the compilation stage, since it is the task of jvm to execute java code.
This is exactly what the packaging code does.
How can a language compile its own language code? All this seems to me a problem with chicken and egg.
True, it is confusing at first glance. Although, this is not only a Java idiom. Ada's compiler is also written on Ada itself. This may look like a “chicken and egg problem,” but actually it’s just a problem with self-loading.
Now, what exactly does the .class file contain? This is an abstract syntax tree in text form, tabular information, what is it?
This is not an Abstract Syntax Tree. AST is used by the tokenizer and compiler only at compile time to represent code in memory. The .class file is similar to the assembly, but for the JVM. The JVM, in turn, is an abstract machine that can run in a specialized machine language that focuses only on a virtual machine. In the simplest case, the .class file has a structure very similar to a regular assembly. First, all static variables are declared, then several external function signature tables go, and finally, machine code.
If you are really curious, you can dig into the class file using the "javap" utility. Here is an example of the (confusing) result of calling javap -c Main :
0: new
So, you should already have an idea of what it really is.
Can anyone tell me how my Java source code is converted to machine code?
I think this should be more clear right now, but here's a brief summary:
You call javac , pointing to your source code file. The javac internal reader (or tokenizer) reads your file and creates a real AST from it. All syntax errors occur from this stage.
javac has not finished his work. When he has an AST, a true compilation can begin. It uses a visitor pattern to bypass AST and resolves external dependencies to add meaning (semantics) to the code. The finished product is saved as a .class file containing bytecode.
Now it's time to run the thing. You call java with a .class file name. Now the JVM starts again, but to interpret your code. The JVM may or may not compile your abstract bytecode into its own assembly. The Sun HotSpot compiler, combined with Just In Time compilation, can do this if necessary. The executable code is constantly profiled by the JVM and recompiled into native code if certain rules are followed. Most often, hot code is compiled first.
Change: Without javac you would have to call the compiler using something similar to this:
%JDK_HOME%/bin/java.exe -cp:myclasspath com.sun.tools.javac.Main fileToCompile
As you can see, it calls the Sun private API, so it is associated with the Sun JDK implementation. This would make the build systems dependent on it. If someone switched to any other JDK (wiki lists 5 except Sun), the above code should be updated to reflect this change (since it is unlikely that the compiler will be in the com.sun.tools.javac package). Other compilers can be written in native code.
So the standard way is to supply the javac shell with the JDK.