I am trying to understand the principles of machine code alignment. I have an assembler implementation that can generate machine code at runtime. I use 16-byte alignment for each destination of the branch, but it seems that this is not the best choice, since I noticed that if I remove the alignment than sometimes, the same code works faster. I think that something is related to the cache line width, so some commands are cut off by the cache line, and because of this, the CPU collides with kiosks. Therefore, if some alignment bytes are inserted in one place, they will move instructions somewhere further, passing the border of the cache boundary ...
I was hoping to introduce an automatic alignment procedure that can process the code as a whole and insert alignment in accordance with the specification of the CPU (cache line width, 32/64 bit, etc.) ...
Can someone give some hints on this procedure? As an example, the target processor may be an Intel Core i7 64-bit platform.
Thanks.
source share