ARM processors typically have both an I / D cache and a write buffer. The idea of โโa write buffer is to group sequential records together (great for synchronous DRAM) and not delay the processor to wait for recording to complete.
To be shared, you can clear the cache and write buffer. The following is an example of an inline ARM assembler that should work for many memory architectures and configurations.
static inline void dcache_clean(void) { const int zero = 0; __asm volatile ("1: mrc p15, 0, r15, c7, c10, 3\n" " bne 1b\n" ::: "cc"); __asm volatile ("mcr 15, 0, %0, c7, c10, 4"::"r" (zero)); }
You may need more if you have an L2 cache.
To answer in the context of Linux, there are various CPU options and different routines depending on the memory / MMU configuration and even processor errors. See for example
These routines are either called directly or viewed in the processor information structure with function pointers to the appropriate procedure for the detected CPU and configuration; depending on whether the kernel is a special target for a single processor or multi-purpose, for example, a Ubuntu distribution .
In order to answer a question specifically about your situation, we need to know the L2 cache, write buffer memory, processor architecture features; possibly including silicon revisions for bugs. Another tactic is to completely avoid this by using dma_alloc_XXX() routines that put the memory as non-cached and unbuffered, so that the CPU record is immediately forced out. Depending on your memory access pattern, any solution is valid. You may want to cache if the memory only needs to be synchronized at some checkpoint (vsync / * hsync * for video, etc.).
source share