It would seem that this is a logical conclusion for superscalar processors with several loading units. Multichannel memory controllers are quite common these days.
In the case of executing a command, a huge amount of logic is expended in order to determine whether the commands have dependencies on others in the stream, not only for register dependencies, but also on memory operations. There is also huge logic for handling exceptions: the processor must execute all the instructions in the thread to failure (or, alternatively, unload some parts of this into the operating system).
In terms of the programming model observed by most applications, effects never occur. As you can see from the memory, it is understood that the loads will not always be executed in the expected sequence - but this is so when caches are used.
Obviously, in circumstances where the order of loading and storage matters - for example, when accessing device registers, OOE must be disabled. For this purpose, the POWER architecture has a wonderful EIEIO instruction.
Some members of the ARM Cortex-A family offer OOE - I suspect that with the power limitations of these devices and the apparent lack of instructions for forcing orders that are always in order
marko source share