Do all programs end up being converted to assembly instructions?

It seems that every class I take is an introduction to a new topic that does not allow me to provide comprehensive knowledge that would allow me to create a real program that can be run outside of the IDE. It is both frustration and intimidation in order to understand something that I don’t know, and it puzzles me that an accredited computer science program may offer a curriculum that does not shed light on this process from the very beginning.

Actual question: Forgive the introduction, but it gives a good idea of ​​my experience. I am currently studying MIPS in the computer architecture class and got a quick introduction to assembly. Fine details about how the program is carried out are often described to me as magic and pushed under the rug for another teacher to explain, if possible.

I understand that processor circuits vary greatly from chip to chip, and therefore may require different low-level instructions to execute the same high-level code. Do all programs end up being converted to assembly language before becoming raw machine code or is this step no longer needed?

If so, at what point does the processor begin to execute its own unique set of instructions? This is the lowest level of code, and at the moment that the program instructions are executed by the processor beaten?

Finally, does all architectures require / need assembly language?

+6
source share
8 answers

Assembly language is, so to speak, a human-readable form of expression of instructions executed by the processor (which are binary data and are very difficult to control a person). Thus, if machine instructions are not generated by humans, the use of an assembly step is optional, although this sometimes happens for convenience. If the program is compiled from a language such as C ++, the compiler can directly generate machine code without going through the intermediate step of the build code. However, many compilers provide the ability to generate assembly code to make it easier for a person to verify what is being generated.

Many modern languages, such as Java and C #, are compiled into so-called bytecode. This is code that the CPU does not directly execute, but rather an intermediate form that can be compiled for machine code just in time (JIT-ted) when the program is executed. In this case, processor-specific processor code is generated, but usually does not go through human readable code.

+7
source

Assembly language is just a human-readable textual representation of the source machine code. It exists for the benefit of programmers (people). This is not at all necessary as an intermediate step for generating machine code. Some compilers generate an assembly and then call assembler to convert this code to machine code. But since omitting this step leads to faster compilation (and this is not so difficult to do), compilers will (generally speaking) seek to directly generate machine code. However, it is useful to be able to compile to the assembly to check the results.

For your last question, assembly language is a convenience for humans, so it doesn’t need architecture. You can create an architecture without it if you want. But in practice, all architectures have assembly language. Firstly, it’s very easy to create a new assembler language: give a text name for all operation codes and registers, add syntax to represent different addressing modes, and you basically do it already. And even if all the code was directly converted from a higher-level language directly to machine language, you still need assembly language, if only as a way to disassemble and visualize machine code when looking for compiler errors, etc.

+3
source

Each general purpose CPU has its own set of instructions. That is, certain sequences of bytes in their execution have a well-known, documented effect on registers and memory. Assembly language is a convenient way to write these instructions so that people can read and write and understand what they are doing without having to constantly look for commands. It is safe to say that for every modern processor there is an assembly language.

Now about whether the programs are converted to assembly. To begin with, the CPU does not execute assembly code. It executes machine code, but there is a one-to-one correspondence between machine code commands and pipelines. As long as you keep this distinction in mind, you can say things like, "and now the CPU does MOV, then ADD," and so on. The CPU, of course, executes machine code that corresponds to the MOV instruction.

However, if your language is compiled into native code, your program will actually be converted to machine code before execution. Some compilers (not all) do this by emitting assembly sources and letting assembler take the final step. This step, when present, is usually well hidden. The assembly view exists only for a short time during the compilation process, unless you tell the compiler not to be in it.

Other compilers do not use the build step, but emit the assembly if requested. For example, Microsoft C ++ accepts the / FA option - publishes an assembly of assemblies along with an object file.

If it is an interpreted language, then there is no explicit conversion to a machine. The source lines are executed by the language interpreter. Bytecode-oriented languages ​​(Java, Visual Basic) live somewhere in between; they are compiled for code that does not match machine code, but is much easier to interpret than a high-level source. It is also fair for them to say that they are not converted to machine code.

+2
source

In principle, yes, Java assembly is called bytecode, and any microarchitecture of the chip will be ISA , which consists of assembly instructions or something similar, while the same ISA can be implemented on many different chips. If you learn MIPS, this is a good introduction so you can learn how C translates to MIPS by the compiler. Then you can see how the MIPS instruction translates to machine code and that the machine code will have an operation code in the ALU that will execute the instruction. For more information, you can read Hennessy / Patterson, who wrote two good books on computer hardware: "Organization and Design of Computers" and "Computer Architecture - A Quantitative Approach"

0
source

Compilers who produce their own machine code produce the appropriate assembly language, which is then compiled into machine code. This is usually done in one step, but some compilers, such as GCC, can also output an intermediate assembly.

You are right that different architectures have different sets of instructions. Using these differences is how compilers can optimize the executable for a single processor.

0
source

This is a pretty big rabbit hole that you look down.

Saying no, not all programs turn into assembly language. If we exclude compilation right on time, interpreted languages ​​like ruby, lisp, and python, as well as programs that run on a virtual machine (VM), such as java and C #, do not turn into an assembly. Rather, it is an existing program, interpreter, or virtual machine that uses the source (interpreted) or bytecode (VM) (which is not the assembly language of your computer) and runs them. The interpreter knows what to do when he sees certain input sequences and takes the right actions, even if he has not seen this particular input before.

Compiled programs, for example, you write in C or C ++, can, as part of the compilation process, turn into an assembly language, which then turns into a machine language. Often this step is skipped to speed up the process. Some compilers, such as LLVM, output a common bitcode so that they can separate the parts of the compiler, which generates the bitcode from the parts that turn the bitcode into machine code, which allows reuse of different architectures.

However, despite the fact that the OS sees the processor as something that consumes machine code, many processors have a lower microcode level. Each command (assmebly-level) in a set of commands is implemented by the CPU as a sequence of simpler microcode operations. Inside the processor, the set of instructions may remain unchanged, while the microcode that implements the instructions changes. Think of a command set as an API for a CPU.

0
source

All processors run on bits, we call this machine code, and it can take on a variety of tastes for a variety of reasons, creating the best mouse trap for patent protection. Each processor uses some taste of machine code from the point of view of users, and some internally convert it to microcode, other machine code, and others. When you hear x86 against arm vs mips vs power pc, these are not just company names, but also have their own instruction sets, machine code for the respective processors. The x86 instruction sets, although evolving, still resemble their history, and you can easily select x86 code from others. And this is true for all companies, you can see mips legacy in mips and arm a arm, etc.

So, to run the program on the processor at some point, it must be converted to machine code for that processor, and then the processor can process it. Different languages ​​and tools do this in different ways. The compiler does not require compilation from a high-level language to assembly language, but it is convenient. Firstly, in any case, you will need assembler for this processor, so there is a tool. Secondly, it’s much easier to debug the compiler by looking at the human assembly language rather than the bits and bytes of machine code. Some compilers, such as JAVA, python, old pascal compilers have universal machine code (each language has its own), universal in the sense that java on x86 and java on the hand does the same with this point, then it is the target (x86 , arm, mips) by an interpreter that decodes a universal bytecode and executes it on a native processor. But ultimately it should be the machine code for the processor on which it runs.

There is also a history with this method of these compilation layers, I would say that this is a somewhat unified approach to the building block, make one block the front end and the other block the backend and output asm, and then asm to object is its own tool, and the object associated with others, it’s its own tool. Each block can be maintained and developed with controlled inputs and outputs and can sometimes be replaced by another block, which is located in one place. Compiler classes teach this model so you can see that they are replicating with new compilers and new languages. analyze the front, high-level language text. Into the intermediate, compiler-specific binary code, then it takes this internal code to the backend and turns it into an assembly for the target processor, allowing, for example, with gcc and many others to change this backend so that the front and middle can be reused for different purposes. Then Separately, there is an assembler, as well as a separate linker, separate tools by themselves.

People keep trying to reinvent the keyboard and mouse, but people are comfortable enough with the old way they stick, even if the new invention is much better. The same is true for compilers and operating systems, and so many other things, we go with what we know, and with the compilers that they often compile for assembly.

0
source

Here's what may confuse you:

  • All programs must be converted to machine instructions, because this is what machines execute.
  • Assembly language is a low-level programming language that matches almost one to one with machine instructions.
  • A program can either be compiled into machine instructions or interpreted as machine instructions executed by an interpreter.
  • Programs usually do not translate to assembly language, as this requires that the assembler language be converted to machine instructions. I seem to recall some very old compilers that release assembly language, but I don't know how to do this today.
  • There are several ways for machines to follow machine instructions. They can be hard or they can use microcode. I suspect that almost all modern processors use microcode. This is really magic.
0
source

Source: https://habr.com/ru/post/954708/


All Articles