My colleagues and I are working with one of our PCIe-based products, and we found that some platform / chipset dependency prevents delivery of interrupts to our Linux kernel driver (rapafp). One older version of the product that we must continue to support in the field has been upgraded from the old PCI design. So, we have some FPGAs, one of which has a 66-MHz PCI-32 interface, and it connects to the Texas Instruments XIO PCI-to-PCIe bridge. I must note that I have been studying this relentlessly for several days, and I am simply not going anywhere. We definitely looked at hardware issues with our own device, but we changed a few cards, and that doesn't make any difference.
Help system that works
We have a system with RHEL6.5 that works great, so we use it as a link. The following is platform information. I donβt know what level of detail you will need, and I donβt want to write a spam question. Please let me know what else it would be useful to provide and how (built-in question, pastebin, etc.).
From uname -a :
Linux DL-2-107.localdomain 2.6.32-431.el6.i686
From / proc / interrupts:
CPU0 CPU1 ... 16: 609672457 1344098703 IO-APIC-fasteoi uhci_hcd:usb3, pata_jmicron, rapafp
Information from dmesg:
rapafp driver version 3.3.0.5 rapafp: Requesting IRQ 16 TSI: rapafp0 (BusID 2:0:0) is RAPTOR 4000 @ 2048x2048 TSI: rapafp1 (BusID 2:0:0) is RAPTOR 4000 @ 1280x1024
From lspci:
# lspci -t -[0000:00]-+-00.0 +-01.0-[01-02]----00.0-[02]----00.0 00:01.0 PCI bridge: Intel Corporation 82Q35 Express PCI Express Root Port (rev 02) (prog-if 00 [Normal decode]) 01:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode]) 02:00.0 Display controller: Tech-Source Device 0042
Installed CPU: model name: Intel (R) Core (TM) 2 CPU E8400 @ 3.00 GHz
Some BIOS information from dmidecode:
Vendor: Phoenix Technologies, LTD Version: 6.00 PG Release Date: 12/12/2008
Note that the driver was never recorded using fasteoi, so it never calls any interrupt calls. However, it works flawlessly on this machine.
A system that cannot receive any interrupts for our driver
We have two systems with interrupt problems. One is running RHEL6.5 (2.6.32-431.el6.i686), and the other is RHEL7.4 (3.10.0-693.17.1.el7.x86_64).
The RHEL6 system may receive interrupts for our driver, but only intermittently. This is probably due to the fact that the kernel connects the device to the broken line of interruption (despite the fact that the driver requests the opposite!), And the driver is not written to be compatible with the launch edge.
The RHEL7 system cannot interrupt our driver at all. Our goal is to port the driver to RHEL7, so I will focus on this machine. Hosts share a lot in common with each other and differences from the frame of reference. The main differences that matter are the kernel version, 32-bit and 64-bit, and possibly the BIOS. For starters, some system information is provided.
From uname -a :
Linux rhel74.techsource.com 3.10.0-693.17.1.el7.x86_64
/ Proc / interrupts:
10: 0 0 IO-APIC-edge rapafp
From dmesg:
[321790.744110] raptor_attach: irq_set_irq_type(10,8) succeeded! [321790.744111] raptor_attach: calling request_irq. [321790.744239] raptor_attach: request_irq(10) succeeded! [321790.744240] raptor_attach: done [321790.744342] TSI: rapafp0 (BusID 2:0:0) is RAPTOR 4000 @ 2048x2048 ... [321807.840300] PCI Config Register dump: [321807.840405] vendor id 0x1227 [321807.840508] device id 0x43 [321807.840611] command register 0x202 [321807.840715] status register 0x2a0 [321807.840818] revision id 0x0 [321807.840921] programming class code 0x0 [321807.841025] sub-class code 0x80 [321807.841129] basic class code 0x3 [321807.841232] header type 0x0 [321807.841335] base register 0 0xbfff0008 [321807.841439] base register 1 0xa0000008 [321807.841542] base register 2 0xb8000008 [321807.841645] base register 3 0x0 [321807.841749] base register 4 0xbffc0008 [321807.841852] base register 5 0x0 [321807.841955] Cardbus CIS Pointer 0x0 [321807.842059] Subsystem Vendor ID 0x1227 [321807.842162] Subsystem ID 0x43 [321807.842266] ROM base register 0x0 [321807.842369] interrupt line 0xa [321807.842472] interrupt pin 0x1 [321807.842576] minimum grant 0x0 [321807.842679] maximum grant 0x0
Information from lspci:
# lspci -t -[0000:00]-+-00.0 +-01.0-[01-02]----00.0-[02]----00.0 00:00.0 Host bridge: Intel Corporation 82X38/X48 Express DRAM Controller (rev 01) Subsystem: Holco Enterprise Co, Ltd/Shuttle Computer Device 3111 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- ... 00:01.0 PCI bridge: Intel Corporation 82X38/X48 Express Host-Primary PCI Express Bridge (rev 01) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 24 ... 01:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- ... 02:00.0 Display controller: Tech-Source Device 0043 Subsystem: Tech-Source Device 0043 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B+ DisINTx- Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 10
Decisions taken
There is a sequence of corrections that I tried to perform. The first thing I did was to go through the interrupt handling code and rewrite it so that it was more friendly to the broken line with the interrupt, but this had no effect. Other things I have done include:
- There was no pci_enable_device call, so I added this. There is no effect.
- I noticed that our call to request_irq used obsolete flags starting with
SA_ , so I replaced them with new ones starting with IRQF_ . I tried all kinds of combinations of flags. IRQF_TRIGGER_RISING, IRQF_TRIGGER_FALLING, IRQF_TRIGGER_HIGH, IRQF_TRIGGER_LOW, combinations of them with and without IRQF_SHARED, etc. None of these affected the delivery of IRQ, what / proc / interrupts reported, or the bridge configurations reported by lspci. However, request_irq never returned any error codes. - I tried calling enable_irq and set_irq_type. No matter what I gave them, there was no effect. Error codes are not returned.
In the end, I noticed that the PCI bridge 00: 01.0 had stale interrupts (DisINTx +). I went hunting for some pre-existing function that would go through the hierarchy of bridges and record interruptions at all, but I could not find anything. So I decided to try experimenting.
First, I wrote my own function, which will climb the bridge hierarchy:
static int raptor_enable_intx(struct pci_dev *dev, TspciPtr pTspci) { int num_en = 0; int result; u16 cmd, old_cmd; while (dev) { pci_read_config_word(dev, PCI_COMMAND, &old_cmd); pci_intx(dev, true); pci_read_config_word(dev, PCI_COMMAND, &cmd); if (cmd & PCI_COMMAND_INTX_DISABLE) { printk (KERN_INFO "raptor_enable_intx: Could not clear DisINTx for device %s\n", pci_name(dev)); } else { printk (KERN_INFO "raptor_enable_intx: Successfully cleared DisINTx for device %s\n", pci_name(dev)); if ((old_cmd & PCI_COMMAND_INTX_DISABLE)) num_en++; } dev = pci_upstream_bridge(dev); } return num_en; }
The main effect that this caused was to make the car hang, although not immediately. I tried calling request_irq before or after raptor_enable_intx. IIRC, one had no effect, while the other made the system hang, although not immediately.
I also found pci_common_swizzle with some comments saying that this is required by the PCI standard, so I call it after the function above. After I do this, I then call request_irq. With these changes, the system freezes immediately on insmod.
Of course, I understand that iterating over bridges and forcibly shutting down PCI_COMMAND_INTX_DISABLE is a disgusting hack, and I wonβt be surprised if this happens, or a swizzle that causes the system to freeze.
In any case, I was lost and confused. Does anyone know what I'm doing wrong? How can I get this system bridge to allow past interrupts to pass?
Thanks in advance for your help!