Flush cache on DRAM

Question

Flush cache on DRAM

I am using the Xilinx Zynq platform with a memory area shared between a programmable HW and an ARM processor.

I reserved this memory using memmap on the kernel command line and then opened it in user space through calls to mmap / io_remap_pfn_range in my driver.

The problem I am facing is that the write time in DRAM takes some time, and I suppose it is stuck in dcache. A set of calls to flush_cache_ * is defined there, but none of them are exported, which is a hint to me that I bark the wrong tree ...

As a trial version, I locally exported flush_cache_mm and just looked at what would happen and there would be no joy.

In short, how can I be sure that any entries in these mmap'd areas were bound to DRAM?

Thanks.

+6

arm linux-kernel xilinx zynq

Brian magnuson Sep 19 '13 at 14:01

source share

3 answers

artless noise · Answer 1 · 2013-09-19T14:22:50+0000

ARM processors typically have both an I / D cache and a write buffer. The idea of a write buffer is to group sequential records together (great for synchronous DRAM) and not delay the processor to wait for recording to complete.

To be shared, you can clear the cache and write buffer. The following is an example of an inline ARM assembler that should work for many memory architectures and configurations.

static inline void dcache_clean(void) { const int zero = 0; /* clean entire D cache -> push to external memory. */ __asm volatile ("1: mrc p15, 0, r15, c7, c10, 3\n" " bne 1b\n" ::: "cc"); /* drain the write buffer */ __asm volatile ("mcr 15, 0, %0, c7, c10, 4"::"r" (zero)); }

You may need more if you have an L2 cache.

To answer in the context of Linux, there are various CPU options and different routines depending on the memory / MMU configuration and even processor errors. See for example

These routines are either called directly or viewed in the processor information structure with function pointers to the appropriate procedure for the detected CPU and configuration; depending on whether the kernel is a special target for a single processor or multi-purpose, for example, a Ubuntu distribution .

In order to answer a question specifically about your situation, we need to know the L2 cache, write buffer memory, processor architecture features; possibly including silicon revisions for bugs. Another tactic is to completely avoid this by using dma_alloc_XXX() routines that put the memory as non-cached and unbuffered, so that the CPU record is immediately forced out. Depending on your memory access pattern, any solution is valid. You may want to cache if the memory only needs to be synchronized at some checkpoint (vsync / * hsync * for video, etc.).

user2365669 · Answer 2 · 2014-09-12T15:06:14+0000

I got the same problem on zynq. Finally, L2 turned red / invalid:

 #include <asm/outercache.h> outer_cache.flush_range(start,size); outer_cache.inv_range(start,size);

start is a pointer to the kernel virtual space. You also need to hide L1 to L2:

 __cpuc_flush_dcache_area(start,size);

I am not sure that before reading it is necessary to invalidate L1, and I did not find a function for this. I guess it should be, and so far I have only been lucky ...

It seems that any suggestions on the “network” that I found suggest that the device is “inside” the L2 cache coherence, so they did not work if the AXI-HP ports were used. When using the AXI-ACP port, L2 flushing is not necessary. (For those who are not familiar with zync: HP ports are directly connected to the DRAM controller, bypassing any cache / MMU implemented on the ARM side)

Alex hornung · Answer 3 · 2013-09-25T07:07:47+0000

I am not familiar with Zynq, but you essentially have two options that really work:

either enable your other logic in the FPGA in the same negotiation domain (if, for example, Zynq has an ACP port)
or mark the memory that you are on the memory card as the device’s memory (or other non-cacheable ones if you do not need data on collection, reordering and early recording) and use DSB after any recording that should be visible.

If the memory is marked as cacheable and your other observer is not in the same coherent domain, you ask for problems - when you clear the D-cache using DCCISW or a similar op, and you have L2 cache - where everything is at the end.

Flush cache on DRAM

More articles: