The MSI Driver Guide HOWTO Tom L Nguyen tom.l.nguyen@intel.com 10/03/2003 Revised Feb 12, 2004 by Martine Silbermann email: Martine.Silbermann@hp.com 1. About this guide This guide describes the basics of Message Signaled Interrupts(MSI), the advantages of using MSI over traditional interrupt mechanisms, and how to enable your driver to use MSI or MSI-X. Also included is a Frequently Asked Questions. 2. Copyright 2003 Intel Corporation 3. What is MSI/MSI-X? Message Signaled Interrupt (MSI), as described in the PCI Local Bus Specification Revision 2.3 or latest, is an optional feature, and a required feature for PCI Express devices. MSI enables a device function to request service by sending an Inbound Memory Write on its PCI bus to the FSB as a Message Signal Interrupt transaction. Because MSI is generated in the form of a Memory Write, all transaction conditions, such as a Retry, Master-Abort, Target-Abort or normal completion, are supported. A PCI device that supports MSI must also support pin IRQ assertion interrupt mechanism to provide backward compatibility for systems that do not support MSI. In Systems, which support MSI, the bus driver is responsible for initializing the message address and message data of the device function's MSI/MSI-X capability structure during device initial configuration. An MSI capable device function indicates MSI support by implementing the MSI/MSI-X capability structure in its PCI capability list. The device function may implement both the MSI capability structure and the MSI-X capability structure; however, the bus driver should not enable both, but instead enable only the MSI-X capability structure. The MSI capability structure contains Message Control register, Message Address register and Message Data register. These registers provide the bus driver control over MSI. The Message Control register indicates the MSI capability supported by the device. The Message Address register specifies the target address and the Message Data register specifies the characteristics of the message. To request service, the device function writes the content of the Message Data register to the target address. The device and its software driver are prohibited from writing to these registers. The MSI-X capability structure is an optional extension to MSI. It uses an independent and separate capability structure. There are some key advantages to implementing the MSI-X capability structure over the MSI capability structure as described below. - Support a larger maximum number of vectors per function. - Provide the ability for system software to configure each vector with an independent message address and message data, specified by a table that resides in Memory Space. - MSI and MSI-X both support per-vector masking. Per-vector masking is an optional extension of MSI but a required feature for MSI-X. Per-vector masking provides the kernel the ability to mask/unmask MSI when servicing its software interrupt service routing handler. If per-vector masking is not supported, then the device driver should provide the hardware/software synchronization to ensure that the device generates MSI when the driver wants it to do so. 4. Why use MSI? As a benefit the simplification of board design, MSI allows board designers to remove out of band interrupt routing. MSI is another step towards a legacy-free environment. Due to increasing pressure on chipset and processor packages to reduce pin count, the need for interrupt pins is expected to diminish over time. Devices, due to pin constraints, may implement messages to increase performance. PCI Express endpoints uses INTx emulation (in-band messages) instead of IRQ pin assertion. Using INTx emulation requires interrupt sharing among devices connected to the same node (PCI bridge) while MSI is unique (non-shared) and does not require BIOS configuration support. As a result, the PCI Express technology requires MSI support for better interrupt performance. Using MSI enables the device functions to support two or more vectors, which can be configure to target different CPU's to increase scalability. 5. Configuring a driver to use MSI/MSI-X By default, the kernel will not enable MSI/MSI-X on all devices that support this capability. The CONFIG_PCI_USE_VECTOR kernel option must be selected to enable MSI/MSI-X support. 5.1 Including MSI support into the kernel To allow MSI-Capable device drivers to selectively enable MSI (using pci_enable_msi as described below), the VECTOR based scheme needs to be enabled by setting CONFIG_PCI_USE_VECTOR. Since the target of the inbound message is the local APIC, providing CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC is enabled or not. int pci_enable_msi(struct pci_dev *) With this new API, any existing device driver, which like to have MSI enabled on its device function, must call this explicitly. A successful call will initialize the MSI/MSI-X capability structure with ONE vector, regardless of whether the device function is capable of supporting multiple messages. This vector replaces the pre-assigned dev->irq with a new MSI vector. To avoid the conflict of new assigned vector with existing pre-assigned vector requires the device driver to call this API before calling request_irq(...). The below diagram shows the events, which switches the interrupt mode on the MSI-capable device function between MSI mode and PIN-IRQ assertion mode. ------------ pci_enable_msi ------------------------ | | <=============== | | | MSI MODE | | PIN-IRQ ASSERTION MODE | | | ===============> | | ------------ free_irq ------------------------ 5.2 Configuring for MSI support Due to the non-contiguous fashion in vector assignment of the existing Linux kernel, this version does not support multiple messages regardless of the device function is capable of supporting more than one vector. The bus driver initializes only entry 0 of this capability if pci_enable_msi(...) is called successfully by the device driver. 5.3 Configuring for MSI-X support Both the MSI capability structure and the MSI-X capability structure share the same above semantics; however, due to the ability of the system software to configure each vector of the MSI-X capability structure with an independent message address and message data, the non-contiguous fashion in vector assignment of the existing Linux kernel has no impact on supporting multiple messages on an MSI-X capable device functions. By default, as mentioned above, ONE vector should be always allocated to the MSI-X capability structure at entry 0. The bus driver does not initialize other entries of the MSI-X table. Note that the PCI subsystem should have full control of a MSI-X table that resides in Memory Space. The software device driver should not access this table. To request for additional vectors, the device software driver should call function msi_alloc_vectors(). It is recommended that the software driver should call this function once during the initialization phase of the device driver. The function msi_alloc_vectors(), once invoked, enables either all or nothing, depending on the current availability of vector resources. If no vector resources are available, the device function still works with ONE vector. If the vector resources are available for the number of vectors requested by the driver, this function will reconfigure the MSI-X capability structure of the device with additional messages, starting from entry 1. To emphasize this reason, for example, the device may be capable for supporting the maximum of 32 vectors while its software driver usually may request 4 vectors. For each vector, after this successful call, the device driver is responsible to call other functions like request_irq(), enable_irq(), etc. to enable this vector with its corresponding interrupt service handler. It is the device driver's choice to have all vectors shared the same interrupt service handler or each vector with a unique interrupt service handler. In addition to the function msi_alloc_vectors(), another function msi_free_vectors() is provided to allow the software driver to release a number of vectors back to the vector resources. Once invoked, the PCI subsystem disables (masks) each vector released. These vectors are no longer valid for the hardware device and its software driver to use. Like free_irq, it recommends that the device driver should also call msi_free_vectors to release all additional vectors previously requested. int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec) This API enables the software driver to request the PCI subsystem for additional messages. Depending on the number of vectors available, the PCI subsystem enables either all or nothing. Argument dev points to the device (pci_dev) structure. Argument vector is a pointer of integer type. The number of elements is indicated in argument nvec. Argument nvec is an integer indicating the number of messages requested. A return of zero indicates that the number of allocated vector is successfully allocated. Otherwise, indicate resources not available. int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec) This API enables the software driver to inform the PCI subsystem that it is willing to release a number of vectors back to the MSI resource pool. Once invoked, the PCI subsystem disables each MSI-X entry associated with each vector stored in the argument 2. These vectors are no longer valid for the hardware device and its software driver to use. Argument dev points to the device (pci_dev) structure. Argument vector is a pointer of integer type. The number of elements is indicated in argument nvec. Argument nvec is an integer indicating the number of messages released. A return of zero indicates that the number of allocated vectors is successfully released. Otherwise, indicates a failure. 5.4 Hardware requirements for MSI support MSI support requires support from both system hardware and individual hardware device functions. 5.4.1 System hardware support Since the target of MSI address is the local APIC CPU, enabling MSI support in Linux kernel is dependent on whether existing system hardware supports local APIC. Users should verify their system whether it runs when CONFIG_X86_LOCAL_APIC=y. In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; however, in UP environment, users must manually set CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and the option for MSI-capable device drivers to selectively enable MSI (using pci_enable_msi as described below). Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI vector is allocated new during runtime and MSI support does not depend on BIOS support. This key independency enables MSI support on future IOxAPIC free platform. 5.4.2 Device hardware support The hardware device function supports MSI by indicating the MSI/MSI-X capability structure on its PCI capability list. By default, this capability structure will not be initialized by the kernel to enable MSI during the system boot. In other words, the device function is running on its default pin assertion mode. Note that in many cases the hardware supporting MSI have bugs, which may result in system hang. The software driver of specific MSI-capable hardware is responsible for whether calling pci_enable_msi or not. A return of zero indicates the kernel successfully initializes the MSI/MSI-X capability structure of the device funtion. The device function is now running on MSI mode. 5.5 How to tell whether MSI is enabled on device function At the driver level, a return of zero from pci_enable_msi(...) indicates to the device driver that its device function is initialized successfully and ready to run in MSI mode. At the user level, users can use command 'cat /proc/interrupts' to display the vector allocated for the device and its interrupt mode, as shown below. CPU0 CPU1 0: 324639 0 IO-APIC-edge timer 1: 1186 0 IO-APIC-edge i8042 2: 0 0 XT-PIC cascade 12: 2797 0 IO-APIC-edge i8042 14: 6543 0 IO-APIC-edge ide0 15: 1 0 IO-APIC-edge ide1 169: 0 0 IO-APIC-level uhci-hcd 185: 0 0 IO-APIC-level uhci-hcd 193: 138 10 PCI MSI aic79xx 201: 30 0 PCI MSI aic79xx 225: 30 0 IO-APIC-level aic7xxx 233: 30 0 IO-APIC-level aic7xxx NMI: 0 0 LOC: 324553 325068 ERR: 0 MIS: 0 6. FAQ Q1. Are there any limitations on using the MSI? A1. If the PCI device supports MSI and conforms to the specification and the platform supports the APIC local bus, then using MSI should work. Q2. Will it work on all the Pentium processors (P3, P4, Xeon, AMD processors)? In P3 IPI's are transmitted on the APIC local bus and in P4 and Xeon they are transmitted on the system bus. Are there any implications with this? A2. MSI support enables a PCI device sending an inbound memory write (0xfeexxxxx as target address) on its PCI bus directly to the FSB. Since the message address has a redirection hint bit cleared, it should work. Q3. The target address 0xfeexxxxx will be translated by the Host Bridge into an interrupt message. Are there any limitations on the chipsets such as Intel 8xx, Intel e7xxx, or VIA? A3. If these chipsets support an inbound memory write with target address set as 0xfeexxxxx, as conformed to PCI specification 2.3 or latest, then it should work. Q4. From the driver point of view, if the MSI is lost because of the errors occur during inbound memory write, then it may wait for ever. Is there a mechanism for it to recover? A4. Since the target of the transaction is an inbound memory write, all transaction termination conditions (Retry, Master-Abort, Target-Abort, or normal completion) are supported. A device sending an MSI must abide by all the PCI rules and conditions regarding that inbound memory write. So, if a retry is signaled it must retry, etc... We believe that the recommendation for Abort is also a retry (refer to PCI specification 2.3 or latest).