How do programmers ensure that compilers create the correct code?

Reading this fascinating (and highest voted question) on SO, Why is a sorted array processed faster than an unsorted array? made me think about the correct compiler code.

For example, the response indicates that:

Intel Compiler 11 does something wonderful. He swaps two loops ...

How does a programmer know the compiler when it normally changes loops?

And, in general, do they use mathematical evidence to demonstrate conclusions?

How does a compiler programmer know that their compiler will generate the correct code? How do they check their conclusion? Do they need to write a test suite that runs the compiler and verify that the generated code is correct?

+6
source share
5 answers

How does a programmer know the compiler when it normally changes loops?

The compiler performs a series of code checks to determine if it is safe to exchange them. For example, if the code is not fully embedded, it probably will not be able to exchange loops. If the code modifies a mutable variable, it will not swap loops. If the code stores the values ​​that were calculated in previous iterations of the loop, the compiler will not exchange loops. If they can be sure that it is safe, because none of these conditions work, the compiler can exchange loops.

And, in general, do they use mathematical evidence to demonstrate conclusions?

Not. They simply develop optimizations and a set of conservative tests to guarantee optimization. Over time, they develop more optimizations and more sophisticated algorithms to detect when the optimization is safe, even when it is less obvious.

How does a compiler programmer know that their compiler will generate the correct code?

They are doing their best. Sometimes they are wrong. People send bug reports and they fix it.

How do they check their conclusion? Do they need to write a test package that launches the compiler and checks the correctness of the generated code?

They absolutely use test kits. When an error is detected in GCC, the test is specifically added to the test suite to ensure that the error is fixed and not re-entered.

+4
source

How does a programmer know the compiler when it normally changes loops?

When the modification does not change the behavior of the program in accordance with the language standard, when the change does not contradict the standard itself.

For example, C and C ++ standards say in several places that the order of evaluating functional parameters and subexpressions is not specified. This gives the compiler the freedom to generate code to evaluate them in any order that he sees fit. If your program depends on a specific order, it does not meet the standard, and you have no right to blame the compiler for “breaking” it.

Compilers can and often use code analysis, logic, and math with all these theorems to optimize the code.

In practice, testing shows whether the compiler performed correctly.

+2
source

Compilation is a complex process. The program is structured as a schedule, and the compiler tries to "optimize" this schedule as much as possible in accordance with the rules that the developers came up with.

However, there is no guarantee that the generated code is anywhere near the “optimal” one. A study was conducted of the so-called "super-optimizers" that try to generate a truly optimal code using automatic verification mechanisms ... that is, they can answer questions such as "Is there a way to compile this algorithm so that it takes less than X cycles Denali is one of the super optimizers I've read about. The technique for some architectures is simpler than for others. The downside is that these super optimizers can take hours, if not days, to compile a simple procedure, not acceptable to most people.

+1
source

One of the basic sanity tests for any compiler written in its own language is to force it to compile itself, and then use the resulting new compiler to compile again. The two resulting compilers must be the same, modulo timestamps.

+1
source

Summary

Ok, so I agree with some answers, but I think that people underestimate how strict and conservative compilers are in their transformations and optimizations. In short, although optimal code generation is indeed a kind of black art, and heuristics are used 99% of the time, and not proofs (someone spoke of Denali as a super optimizer), the correct code generation is, in fact, mathematically justified and designed for this. The fact that there are errors is explained by the enormous complexity of the compiler code bases. As a node side, an automatic error analyzer may have errors, although each part is mathematically justified, so the fact that compilers have errors does not mean that they are random sets of rewrite rules.

Now, to answer your questions in turn:

How does a programmer know the compiler when it normally changes loops?

There is a subfield of compiler theory called dependency analysis that deals with such issues. There are whole books in this thread, but for your specific question: the compiler will automatically determine the iterative space of the loop (if the boundaries are calculated, see NB below), it will determine the dependencies between the loop instructions (there are different types of dependencies), it will calculate distance vectors and directions of the cycle, and then find out whether it is legal to make an exchange. The Wikipedia article on this simply gives different types of dependencies, but has links to other subtopics, and also cites the book I mentioned.

And, in general, do they use mathematical evidence to demonstrate conclusions?

No , not at all. It also depends on what we mean by proof — do you mean using the proof of the theorem? For example, a Denali document uses a SAT solver. Dafny's compiler and language from Microsoft Research @Cambridge incinerates your code against annotations in the source language. The compiler checks the correctness of your code, although working with annotations is very difficult (I messed around with Daphne for a project at the university, and I speak from experience). Also, not in general, I am not referring to gcc , the llvm family, and I would suspect (but I did not check) Intel Intel icc . And again, we are talking here about a C-like source for compiling a machine level, nothing more interesting.

How does a compiler programmer know that their compiler will generate the correct code? How do they check their conclusion? Do they need to write a test package that launches the compiler and checks the correctness of the generated code?

Other answers covered this broadly, so not much to say. I think I can add that since compilers are really BEASTS in terms of complexity, testing them is very important. In addition, I would like to repeat that even in a mathematically healthy piece of code (if we assume that we have such concepts that are formally defined), there may be errors in the infrastructure that processes this code, so testing is always important.

Verdict

Compilers are very complex. They must be fast in order to provide seamless developer-debug-debug-outlines as well as correct ones . The correctness of the compilers of the code that they produce is a kind of "locally mathematically sound" - every transformation that the compiler does must preserve the semantics of the original program. Improvements in speed are cited by heuristic algorithms, and such algorithms are good examples of black art techniques in the sense that they do not provide evidence that the resulting code will be faster. They also do not provide formal evidence that they will be correct , but each transformation is designed to create the correct semantic conservation code. Finally, the hottest trend in compilers (I think, like everywhere else in Computer Science) is to use machine learning for optimization, so we will have even more buttons for customization :) yay

NB

Almost everything at the end of the “compiler,” that is, the code generator, is NP complete or unsolvable. In particular, determining whether loop swapping is allowed is unsolvable, since loop constraints can be arbitrary expressions (for example, function calls). However, if the estimates are computable (say, integers or something simple), then when the dependency analysis comes in.

+1
source

Source: https://habr.com/ru/post/920469/


All Articles