Summary
Ok, so I agree with some answers, but I think that people underestimate how strict and conservative compilers are in their transformations and optimizations. In short, although optimal code generation is indeed a kind of black art, and heuristics are used 99% of the time, and not proofs (someone spoke of Denali as a super optimizer), the correct code generation is, in fact, mathematically justified and designed for this. The fact that there are errors is explained by the enormous complexity of the compiler code bases. As a node side, an automatic error analyzer may have errors, although each part is mathematically justified, so the fact that compilers have errors does not mean that they are random sets of rewrite rules.
Now, to answer your questions in turn:
How does a programmer know the compiler when it normally changes loops?
There is a subfield of compiler theory called dependency analysis that deals with such issues. There are whole books in this thread, but for your specific question: the compiler will automatically determine the iterative space of the loop (if the boundaries are calculated, see NB below), it will determine the dependencies between the loop instructions (there are different types of dependencies), it will calculate distance vectors and directions of the cycle, and then find out whether it is legal to make an exchange. The Wikipedia article on this simply gives different types of dependencies, but has links to other subtopics, and also cites the book I mentioned.
And, in general, do they use mathematical evidence to demonstrate conclusions?
No , not at all. It also depends on what we mean by proof — do you mean using the proof of the theorem? For example, a Denali document uses a SAT solver. Dafny's compiler and language from Microsoft Research @Cambridge incinerates your code against annotations in the source language. The compiler checks the correctness of your code, although working with annotations is very difficult (I messed around with Daphne for a project at the university, and I speak from experience). Also, not in general, I am not referring to gcc
, the llvm
family, and I would suspect (but I did not check) Intel Intel icc
. And again, we are talking here about a C-like source for compiling a machine level, nothing more interesting.
How does a compiler programmer know that their compiler will generate the correct code? How do they check their conclusion? Do they need to write a test package that launches the compiler and checks the correctness of the generated code?
Other answers covered this broadly, so not much to say. I think I can add that since compilers are really BEASTS in terms of complexity, testing them is very important. In addition, I would like to repeat that even in a mathematically healthy piece of code (if we assume that we have such concepts that are formally defined), there may be errors in the infrastructure that processes this code, so testing is always important.
Verdict
Compilers are very complex. They must be fast in order to provide seamless developer-debug-debug-outlines as well as correct ones . The correctness of the compilers of the code that they produce is a kind of "locally mathematically sound" - every transformation that the compiler does must preserve the semantics of the original program. Improvements in speed are cited by heuristic algorithms, and such algorithms are good examples of black art techniques in the sense that they do not provide evidence that the resulting code will be faster. They also do not provide formal evidence that they will be correct , but each transformation is designed to create the correct semantic conservation code. Finally, the hottest trend in compilers (I think, like everywhere else in Computer Science) is to use machine learning
for optimization, so we will have even more buttons for customization :) yay
NB
Almost everything at the end of the “compiler,” that is, the code generator, is NP complete or unsolvable. In particular, determining whether loop swapping is allowed is unsolvable, since loop constraints can be arbitrary expressions (for example, function calls). However, if the estimates are computable (say, integers or something simple), then when the dependency analysis comes in.