One of the most interesting is to avoid cache collisions. If you know the memory access pattern, you can display the available elements in such a way as to minimize the collision of cache lines between the available data. You can do this for data and code.
Identifying data access patterns is relatively difficult, but you can relatively easily define code access patterns. Given the call graph, the many blocks that make up the function bodies, as well as some estimates of the transition frequencies between the blocks, you can assign cache code blocks in such a way as to maximize the likelihood that the next block you will need to be in some other cache line that does not conflict with current. One interesting idea was that you only had to assign blocks of code that were "hot" (high probability of execution); no matter where you put the cold. IIRC, this means that you can sort the blocks by the frequency of probable execution, and then assign them in that order.
You just need a global analysis: -} The first thing I read about this, the optimizer was actually implemented as part of the linker, which is one way to access the entire program.
I donβt remember either a good review, or a set of compiled methods. However, PLDI conferences have research papers on this topic.
source share