This document examines the success of the widely adopted PCI bus and describes the higher-performance next generation of I/O interconnect technology – PCI Express – that will serve as a standard local I/O bus for a wide variety of future computing platforms. This paper also offers a technical overview of the evolution of PC buses, the physical and software layers of PCI Express, the benefits of PCI Express, and the implications this exciting new technology will have on measurement and automation systems.
2. PC History
When the PCI bus was first introduced in the early 1990s, it had a unifying effect on the plethora of I/O buses available on PCs at that time, such as the VESA local bus, EISA, ISA, and Micro Channel, as shown in Figure 1. It was first implemented as a chip-to-chip interconnect and a replacement for the fragmented ISA bus. During these early years, the 33 MHz PCI bus was a good match for the I/O bandwidth requirements of mainstream peripherals. Today, however, the story is quite different. Processor and memory frequencies have increased dramatically, with processor speeds increasing at the most aggressive rate. Over the intervening period, the PCI bus has increased in frequency from 33 to 66 MHz, while processor speeds have increased from 33 MHz to 3 GHz. Emerging I/O technologies such as Gigabit Ethernet and IEEE 1394B can monopolize nearly all of the available PCI bus bandwidth as a single device on the bus.
Figure 1. Evolution of PC Buses
3. PCI Bus History and Overview
The PCI bus presented a number of advantages over previous bus implementations. Among the most important were processor independence, buffered isolation, bus mastering, and true plug-and-play operation. Buffered isolation essentially isolates, both electrically and by clock domains, the CPU local bus from the PCI bus. This feature adds two main benefits to system performance. The first is the ability to run concurrent cycles on the PCI bus and CPU bus; the second allows an increase in the CPU local bus frequency, independent of the PCI bus speed and loading. With bus mastering, PCI boards can gain access to the PCI bus through an arbitration process and master the bus transaction directly, as opposed to waiting for the host CPU to service the device, which results in a reduction of overall latency on servicing I/O transactions. Finally, plug-and-play operation, which permits devices to be automatically detected and configured, eliminated the manual setting of switches and jumpers for base address and DMA interrupts that frustrated ISA-based board users.
4. PCI Challenges
Although PCI has enjoyed great success, it now faces a series of challenges, including bandwidth limitations, host pin-count limitations, the lack of real-time data transfer services such as isochronous data transfers, and the lack of features to meet next-generation I/O requirements such as quality of service, power management, and I/O virtualization.
Since the introduction of PCI, there have been several revisions to the PCI specification to keep up with the ever increasing I/O demands. These are summarized in Table 1.
Table 1. PCI Bandwidth and Market Use
The usable bandwidth of the PCI bus and its derivatives can be significantly less than the theoretical bandwidth due to protocol overhead and bus topology. On the PCI bus, the available bandwidth is shared by all devices on the bus, resulting in lower bandwidth available per device as more devices are added.
As PCI clock frequencies have become inadequate in certain applications, PCI derivatives such as PCI-X and the advanced graphics port (AGP) have sought to provide bandwidth relief by increasing bus frequencies. A side effect of increasing frequencies leads to a commensurate reduction in the distance the bus can be routed and the number of connectors the bus transceivers can drive. This then leads to the concept of dividing the PCI bus into multiple segments. Each of these segments requires a full PCI-X bus to be routed from the host driving silicon to each active slot. For example, the 64-bit PCI-X requires 150 pins for each segment. Clearly this is costly to implement and places strain on routing, board layer count, and chip package pinouts. This extra cost is justified only where the bandwidth is crucial, such as in servers.
Applications such as data acquisition, waveform generation, and multimedia applications including streaming audio and video require guaranteed bandwidth and deterministic latency, without which the user experiences glitches. The original PCI specification did not address these issues because the applications were not prevalent at the time the specification was developed. Today’s isochronous data transfers, such as high-definition uncompressed video and audio, demonstrate the need for the I/O system to include isochronous transfer capabilities. A side effect of isochronous transfers is that the local PCI Express boards need a lot less memory for buffering than typical PCI boards use for minimizing variable bandwidth issues.
Finally, next-generation I/O requirements such as quality of service measurements and power management improve data integrity and permit selective powering down of system devices – an important consideration as the amount of power required by modern PCs continues to grow. Virtual channels permit data to be routed via virtual routes; data transfers take place even if other channels are blocked by outstanding transactions. Some of these features require software control beyond traditional PCI requirements and are not available until OS and device driver support is available.
PCI Express, the next-generation of the PCI bus, was introduced in 2004 to overcome these challenges. Today, most PCs are shipped with a combination of PCI and PCI Express slots. It won't be long before the PCI bus is completely phased out.
5. PCI Express Architecture
The PCI Express architecture is specified in layers, as shown in Figure 2. Compatibility with the PCI addressing model (a load-store architecture with a flat address space) is maintained to ensure that all existing applications and drivers operate unchanged. The PCI Express configuration uses standard mechanisms defined in the PCI plug-and-play specification. The software layers generate read and write requests that are transported by the transaction layer to the I/O devices using a packet-based, split-transaction protocol. The link layer adds sequence numbers and CRC to these packets to create a highly reliable data transfer mechanism. The basic physical layer consists of a dual simplex channel implemented as a transmit pair and a receive pair. The transmit and receive pair together are called a lane. The initial speed of 2.5 Gb/s provides a nominal bandwidth of about 250 MB/s in each direction per PCI Express lane. Once overhead is taken into account, about 200 MB/s of this is usable by the device for data movement. This rate represents a twofold to fourfold increase over most classic PCI boards. And unlike PCI, where the bus bandwidth was shared among devices, this bandwidth is provided to each device.
Figure 2. PCI Express Layered Architecture
The fundamental PCI Express link consists of two low-voltage AC-coupled differential pairs of signals (a transmit pair and a receive pair) as shown in Figure 3. The physical link signal uses a deemphasis scheme to reduce intersymbol interference, thus improving data integrity. A data clock is embedded using the 8b/10b encoding scheme to achieve very high data rates. The initial signaling frequency is 2.5 Gb/s/direction (Generation 1 signaling), and this is expected to increase in line with advances in silicon technology to 10 Gb/s/direction (the practical maximum for signals in copper). The physical layer transports packets between the link layers of two PCI Express agents.
Figure 3. PCI Express Physical Link Diagram
The bandwidth of a PCI Express link may be linearly scaled by adding signal pairs to form multiple lanes. The physical layer provides x1, x2, x4, x8, x12, x16, and x32 lane widths, which conceptually splits the incoming data packets among these lanes. Each byte is transmitted, with 8b/10b encoding, across the lane(s). This data disassembly and reassembly is transparent to other layers. During initialization, each PCI Express link is set up following a negotiation of lane widths and frequency of operation by the two agents at each end of the link. No firmware or OS software is involved. The PCI Express architecture provides for future performance enhancements via speed upgrades and advanced encoding techniques. The future speeds, encoding techniques, or media will impact only the physical layer.
The use of different lane widths in PCI Express requires the user to be attentive to the width required by an expansion board and to match that to the lane width provided by the motherboard. Other than graphics boards, which tend to be x16, most PCI Express expansion boards currently use the x1 width. As higher bandwidth is required, more boards will use the wider widths. Variations on the combination of x1, x4, x8, and x16 slots depend on the intended market for the computer. PCI Express tolerates some interoperability between mismatched lane widths, depending on the direction of the mismatch. Using a larger-width expansion board in a smaller-width connector is referred to as down-plugging. For example, with PCI, you can insert a 64-bit PCI board in a 32-bit slot. In PCI Express, however, down-plugging is physically prevented by the design of the expansion board and connectors. The other mismatch – using a smaller-width expansion board in a larger-width connector – is up-plugging. Up-plugging is allowed with the caveat that the motherboard vendor is required to support the expansion board at only a x1 data rate while in this configuration, potentially wasting the investment in the expansion board with the faster interface. Whether a particular motherboard can handle an expansion board at its full data rate in an up-plugged configuration must be confirmed on a case-by-case basis with the motherboard manufacturer. For example, some motherboards can handle a x4 expansion board at its full x4 data rate when plugged into a x8 or x16 slot, while other motherboards from the same vendor might run the board only at x1. In the case where a motherboard has both an integrated (onboard) graphic controller and an x16 PCI Express slot for future graphics expansion, it is normally not possible to use that x16 slot at the same time as onboard graphics are enabled.
Data Link Layer
The primary role of a link layer is to ensure reliable delivery of the packet across the PCI Express link(s). The link layer is responsible for data integrity and adds a sequence number and a CRC to the transaction layer packet as shown in Figure 4. Most packets are initiated at the transaction layer. A credit-based, flow-control protocol ensures that packets are transmitted only when it is known that a buffer is available to receive this packet at the other end, which eliminates any packet retries and the associated waste of bus bandwidth due to resource constraints. The link layer automatically retries a packet that was signaled as corrupted.
Figure 4. The data link layer adds data integrity features.
The transaction layer receives read and write requests from the software layer and creates request packets for transmission to the link layer. All requests are implemented as split transactions and some of the request packets require a response packet. The transaction layer also receives response packets from the link layer and matches these with the original software requests. Each packet has a unique identifier that enables response packets to be directed to the correct originator. The packet format offers 32-bit memory addressing and extended 64-bit memory addressing. Packets also have attributes such as “no-snoop,” “relaxed ordering,” and “priority,” which may be used to route these packets optimally through the I/O subsystem.
The transaction layer provides four address spaces – three PCI address spaces (memory, I/O, and configuration) and message space. PCI 2.2 introduced an alternate method of propagating system interrupts called message signaled interrupt (MSI). Here a special-format memory-write transaction was used instead of a hard-wired sideband signal as an optional capability in a PCI 2.2 system. The PCI Express specification reuses the MSI concept as a primary method for interrupt processing and uses a message space to accept all prior sideband signals, such as interrupts, power-management requests, and resets, as in-band messages. Other “special cycles” within the PCI 2.2 specification, such as interrupt acknowledge, are also implemented as in-band messages. You could think of PCI Express messages as “virtual wires” because their effect is to eliminate the wide array of sideband signals currently used in a platform implementation.
Software compatibility is of paramount importance for PCI Express. There are two facets of software compatibility – initialization (or enumeration) and run time. PCI has a powerful initialization model where the OS can discover all of the add-in hardware devices present and then allocate system resources, such as memory, I/O space, and interrupts, to create an optimal system environment. The PCI configuration space and the programmability of I/O devices are key concepts that remain unchanged within the PCI Express architecture; in fact, all OSs are able to boot without modification on a PCI Express-based machine. The run-time software model used by PCI is a load-store, shared-memory model, which is maintained within the PCI Express architecture to enable all existing software to execute unchanged. New software can also take advantage of some of the more advanced features of PCI Express, such as advanced switching, that are not covered in this paper.
6. PC Architecture – Today and Future
PC Architecture in 2002 with PCI
The PC architecture in 2002 consisted of a number of diverging requirements for each of the interconnects. For instance, graphics boards were interfaced via the advanced graphics port (AGP), and the memory bridge was connected to the I/O bridge via a number of interfaces, such as HubLink, as illustrated in Figure 5.
Figure 5. PC Architecture in 2002 with PCI (courtesy of Intel)
PC Architecture with PCI Express
As shown in Figure 6, PCI Express unifies the I/O system using a common bus architecture. In addition, PCI Express replaces some of the internal buses that link subsystems.
Figure 6. PC Architecture with PCI Express Implementation (courtesy of Intel)
7. PCI Express Packaging
PCI Express is available in a number of different I/O expansion formats, depending on the application platform – notebook, desktop, or server. Servers, which require larger bandwidths to service I/O requirements, have more PCI Express slots, and these slots provide higher PCI Express lane counts. In contrast, a notebook may use the PCI Express architecture internally but provide only a single x1 lane for medium-speed peripherals.
Desktop PCI Express Expansion Slots
The replacement for the PCI board for desktop and workstation machines has a very similar mechanical structure to today’s PCI boards, based on a board-edge connector and retaining bracket with I/O connectors protruding through the bracket and attached to the main PWB. The connector on the motherboard has improved retention capabilities to ensure that the board does not become dislodged from the connector under vibration or during shipping. The board-edge connector is available in a number of different sizes, depending on PCI Express lane width, from x1 up to x16. Figure 7 shows a photo of a motherboard with four slots – one PCI Express x16, one PCI, one PCI Express x8, and one PCI-X (from bottom to top). Figure 8 shows a photo of a typical graphics board featuring a x16 link and the potential to move data at 3.2 GB/s. Figure 9 shows a mechanical drawing for various PCI Express connectors.
Figure 7. Motherboard with Four Slots – PCI Express x16, PCI, PCI Express x8, and PCI-X (from bottom to top)
Figure 8. A Graphics Board with the x16 Interface
Figure 9. Mechanical Drawings of Various PCI Express Connectors
The ExpressCard standard offers a very easy way to add hardware or media to your systems. The primary market for ExpressCard modules is notebooks and small PCs needing only limited expansion. The ExpressCard module can be plugged in or removed at almost any time without any tools (unlike traditional add-in boards for desktop computers). ExpressCard technology provides desktop and mobile computer users a consistent, easy, and reliable way to incorporate devices into their systems.
ExpressCard technology replaces conventional parallel buses for I/O devices with two scalable, high-speed serial interfaces – PCI Express and USB 2.0. ExpressCard developers can create modules using PCI Express for their highest-performance applications or use USB to take advantage of the wide variety of USB silicon already available. Irrespective of the bus technology that the module vendor chooses, the end user experience is the same. There are no external indications to the end user of the underlying bus that the module is using.
There are two standard formats for ExpressCard modules – ExpressCard/34 (34 mm wide) and ExpressCard/54 (54 mm wide). Both modules are 5 mm thick, the same as the Type II PC Card. The standard module length is 75 mm, which is 10.6 mm shorter than a standard PC Card. Both the ExpressCard/34 and ExpressCard/54 modules use the same connector interface.
The two sizes of ExpressCard modules give system manufacturers a degree of flexibility that they did not have with earlier module standards. While the ExpressCard/34 device is better suited to smaller systems, the wider ExpressCard/54 module can accommodate devices that do not physically fit into the narrower ExpressCard/34 format. Figure 10 shows the two ExpressCard module sizes and compares them to the PCMCIA CardBus module. Examples of the larger ExpressCard/54 modules include SmartCard readers, CompactFlash readers, and 1.8 in. disk drives. In addition to providing extra space for components, the ExpressCard/54 module dissipates more thermal energy than the smaller ExpressCard/34 module. This feature may make ExpressCard/54 a natural choice for higher-performance and first-generation applications. However, a module manufacturer who can fit an application into the narrow module has an advantage in that the module works in both sizes of ExpressCard slots. To improve the ease of use, the ExpressCard/54 slot includes a guidance feature designed to steer ExpressCard/34 modules easily into the connector socket. It is also worth pointing out that the dimensions are such that inserting a CardBus card into an ExpressCard slot or vice versa does not damage either part.
Each slot of the ExpressCard host interface must include a single PCI Express lane (x1) operating at the baseline 2.5 Gb/s data rate, in each direction, as defined by the PCI Express Base Specification 1.0a. The ExpressCard host interface must also accept the low-, full-, and high-speed USB data rates as defined by the USB 2.0 Specification. Providing both interfaces is a condition for being an ExpressCard-compliant host platform. An ExpressCard module can use one or both of the standard interfaces depending on the application requirements.
Figure 10. ExpressCard Mechanical Formats with PCMCIA/CardBus for Comparison
Server I/O Modules
Servers need I/O adapter packaging that addresses items such as closed-chassis adapter removal/insertion, integrated hot-plug, adapter protection from ESD and handling damage, standardized management interfaces and features, adequate cooling, single bulk power supply, and reduced I/O footprint in servers. The Server I/O Module (SIOM) specification defines two PCI Express modular packages for I/O adapters, which are designed to provide closed-chassis installation and removal. They are also designed for native hot-plug, meaning that no damage occurs to the module or chassis from installing or removing the modules while the chassis is powered. The two formats, identical in height and depth, vary only by the width or slot space they consume in the chassis. The single-wide module fits in any SIOM slot. The double-wide module requires two adjacent SIOM slots.
A single-wide module can have up to a x8 PCI Express connection; a double-wide module can offer up to a x16 connection. (Double-wide slots are optional in PCI Express SIOM servers.) A minimum management interface, standard on all modules, consists of an EEPROM that provides management data about the adapter to the host system. An optional internal storage interface accepts up to four x1 ports of SAS/SATA and a sideband interface for controlling drive LEDs.
SIOMs and SIOM chassis share requirements for cooling. SIOM chassis are required to provide a minimum airflow to each slot as defined by this specification. SIOMs are required to provide a minimum and maximum airflow resistance to ensure that the SIOM has a defined airflow for I/O cooling and that I/O thermal load is not added to the chassis.
8. Benefits of PCI Express
For PC-based measurement and automation systems, the PCI bus has been the bus of choice for plug-in expansion boards for many years. As the PC has evolved, the PCI bus (with its parallel architecture) has not scaled linearly with the rest of the platform. PCI Express answers these issues and provides benefits across five main areas:
- High performance – Relates specifically to bandwidth, which is more than double that of PCI in a x1 link, and grows linearly as more lanes are added. An additional benefit that is not immediately evident is that this bandwidth is simultaneously available in both directions on each link. In addition, the initial signaling speed of 2.5 Gb/s is expected to increase, yielding further speed improvements.
- I/O simplification – Relates to the streamlining of the plethora of both chip-to-chip and internal user-accessible buses, such as AGP, PCI-X, and HubLink. This feature reduces the complexity of design and cost of implementation.
- Layered architecture – PCI Express establishes an architecture that can adapt to new technologies while preserving software investment. Two key areas that benefit from the layered architectures are the physical layer, with increased signaling rates, and software compatibility.
- Next-generation I/O – PCI Express provides new capabilities for data acquisition and multimedia through isochronous data transfers. Isochronous transfers provide a type of quality of services (QOS) guarantee that ensures on-time data delivery through deterministic, time-dependent methods.
- Ease of use – PCI Express greatly simplifies how users add and upgrade systems. IT offers both hot-swap and hot-plug. In addition, the variety of formats for PCI Express boards, especially SIOM and ExpressCard, greatly increases the ability to add high-performance peripherals in servers and notebooks.
All of these features will ensure that the PC evolves into an increasingly attractive platform on which to base next-generation measurement and automation systems.
8b/10b encoding – A scheme for encoding signals with an embedded clock. The encoding serves two purposes. First, it ensures that there are enough transitions in the data stream for clock recovery and, second, that the number of 0s and 1s is matched, maintaining DC balance in AC-coupled systems.
Advanced graphics port (AGP) – A higher-speed version of the PCI bus using a different connector. Developed to accommodate the bandwidth needs of dedicated plug-in graphics cards in desktop PCs.
Cyclic redundancy check (CRC) – A method of detecting and correcting bit errors in a packet of information by adding a calculated set of values to the packet. The values are derived from an original packet of data.
Differential – Differential signaling uses two wires carrying a signal which is 180 deg out of phase. The main benefit is a reduction in susceptibility to induced noise.
ExpressCard – A small I/O card including both PCI Express and USB 2.0 interfaces.
Industry Standard Architecture (ISA) – A bus standard for PCs introduced in 1984 that extended the XT bus architecture to 16 bits. It is designed to connect peripheral cards to the motherboard. It is also referred to as the AT bus.
PICMG – PCI Industrial Computer Manufacturers Group – The group of member companies that maintains current specifications for CompactPCI and PCI/ISA.
Peripheral component interconnect (PCI) – A high-speed parallel bus originally designed by Intel to connect I/O peripherals to a CPU.
PCI Express – An evolutionary version of PCI that maintains the PCI software usage model and replaces the physical bus with a high-speed (2.5 Gb/s) serial bus serving multiple lanes.
Server I/O module (SIOM) – An I/O module, designed for server and workstation applications, that uses PCI Express for communication.
USB 2.0 – An external differential point-to-point serial bus that provides data rates up to 480 Mb/s. USB 2.0 is an extension of USB 1.1 that uses the same cables and connectors.