First, you seem to have a very specific processor family. Not everyone has an instruction that acts directly in memory.
Even if they are, one instruction of this kind can be very complicated and expensive. If it is truly atomic, as you say, it must stop all other bus transfers. This slows down the computation to the speed of the memory bus. Usually these orders are slower than the processor.
source share