Another problem is code alignment and other optimizations. Currently, cpu is so complex that it is difficult to predict which methods will lead to faster execution of the final binary version.
As a real-life example, letβs take a look at the Google Native Client - I mean the original approach to building nacl, not LLVM (because, as far as I know, there is currently support for both "nativeclient" and "LLVM bitcode", ( modyfied)).
As you can see in presentations (look at youtube.com) or in papers , like this Native client: Sandbox for portable, unreliable native x86 code , even their alignment method makes the code size larger, in some cases such alignment of instructions (for example, using noops) gives a better cache.
Aligning commands with noop and reordering commands known in parallel computing, and here it also shows that this also affects.
I hope that this answer gives an idea of ββhow many circumstances can affect the execution of the code rate, and there are many possible reasons for different code fragments, and each of them needs to be investigated. Nevermore, this is an interesting topic, so if you find more information, do not repeat your answer and tell us in the "Post-Scriptorium" that you found more :). (Maybe a link to whitepaper / devblog with new findings :)). Tests are always welcome - see: http://llvm.org/OpenProjects.html#benchmark .
Grzegorz Wierzowiecki Aug 30 2018-11-21T00: 00Z
source share