How to make mmap for cached PCIe BAR

I am trying to write a driver with a custom mmap() function for a PCIe BAR, with the goal of making this BAR cacheable in the processor cache. I know that this is not the best way to achieve maximum throughput and that the recording order is unpredictable (in this case, there are no problems).

This is similar to what is described in How to prevent caching of MMAP values?

Sandy Bridge i7 processor, PCIe device is Altera Stratix IV dev. board.

At first I tried to do this on CentOS 5 (2.6.18). I changed the MTRR settings to make sure that the BAR is not in the wrong MTRR and the io_remap_pfn_range() bit is used with _PAGE_PCD and _PAGE_PWT . It reads the work as expected: it reads the returned correct values ​​and the second read to the same address does not necessarily lead to the read being transferred to PCIe (the read counter is checked in FPGA). However, the entries caused the system to freeze and then reboot without messages in the logs or on the screen.

Secondly, I tried to do this on CentOS 6 (2.6.32), which supports PAT. The result is the same: reading works correctly, writing causes the system to freeze and reboot. Interestingly, continuous recording as a string (AVX / SSE), independent of time / recording, works as expected, i.e. They always go to FPGA, and FPGA observes the complete entry in the cache line, after which it reads the returned correct values. However, a simple 64-bit write still causes the system to freeze / reboot.

I also tried ioremap_cache() and then iowrite32() inside the driver code. The result is the same.

I think this is a hardware problem, but I would appreciate it if anyone could share any ideas on what was going on.

EDIT: I was able to capture the MCE message on CentOS 6: Exclude computer scan: 5 Bank 5: be2000000003110a.

I also tried the same code on a 2-piston Sandy Bridge (Romley): reading and non-temporal behavior of the record are the same, simple records do not cause MCE / crash, but do not affect the state of the system, that is, the value in memory does not change.

In addition, I tried the same code on an older system with two Nehalem sockets: simple entries also call MCE, although the codes are different.

+6
source share
1 answer

I am not aware of any x86 hardware that supports WriteBack (WB) memory type for MMIO addresses, and you almost certainly see the result of this incompatibility. I posted a discussion on this topic on my blog at http://blogs.utexas.edu/jdm4372/2013/05/29/ and http://blogs.utexas.edu/jdm4372/2013/05/30/

In these posts, I discuss a method that works on some processors - double-display the MMIO range - once to store operations from the processor to the FPGA using the Write-Combining (WC) memory type and once to read from the processor to the FPGA using Write types Protect (WP) or Write Through (WT). You will need to maintain consistency manually by using CLFLUSH in the cache lines in the read-only area when you write the alias of this line in the write-only area. You will also need to maintain manual consistency with respect to changes in values ​​in the FPGA, as IO devices cannot create cache invalid transactions for MMIO addresses.

My team did this a few years ago when I was at AMD, and now I'm trying to figure out how to do this with the new Linux kernels and Intel processors. Linux does not directly support WP or WT memory types with predefined display functions, so some hacker attack is required ... It is quite easy to override the MTRR for the region, but I have more problems finding the right place (s) in the descendants of the remap_pfn_range () function, which I need to change to get the WP or WT attribute set in the PAT entries for the range.

This method is probably better suited for FPGAs than for other (predefined) types of IO devices, since FPGA programmability allows you to flexibly define PCI BARs for operating in dual-display mode and cooperate with a processor-side driver that supports caching coherence.

+6
source

Source: https://habr.com/ru/post/919288/


All Articles