New ARM processors include PLD and PLI instructions.
I write hard inner loops (in C ++) that have a sequential memory access pattern, but a pattern that, naturally, my code fully understands. I would expect significant acceleration if I could pre-select the next place when processing the current memory cell, and I would expect it to be fast enough to experience to be worth the experience!
I use the new expensive compilers from ARM, and it seems that it does not include PLD instructions anywhere, not to mention that I take care in this particular cycle.
How to include explicit prefetch instructions in your C ++ code?
source
share