Streaming Architecture of the Industry’s Highest Performance PXI Express Platform

Contents

Overview

Many applications such as RF record and playback, electronic device validation and high channel count data acquisition generate tremendous amounts of data. Traditionally, benchtop instrumentation systems such as oscilloscopes, logic analyzers, and arbitrary waveform generators have implemented limited data streaming. As instruments evolve, they may have the capability of incredibly fast sampling rates and high signal bandwidths but the bus that interfaces the instrument with the PC to return data to the user for processing or storage is often the bottleneck.  The throughput capabilities of this data communication bus can directly impacts instrumentation bandwidth access and as a result, overall test and measurement times.

As PC-based measurement hardware continues to adopt progressively higher performance data buses, not only can it address the needs of existing applications more efficiently but also address the demands of new applications that previously couldn’t be met.  The evolution to the PCI Express/PXI Express bus enables even faster data transfers. Implementing streaming from the instrument, through the controller, and onto hard disk increases the available memory of the instrument from megabytes to terabytes.  By utilizing the high-bandwidth PXI Express bus architectures, data can stream to and from hard disk at a rate high enough to support high-end instrumentation.  Now, with faster read/write speeds and increased storage capacity, data streaming enables faster sampling rates over longer testing periods than ever before.

Typical Streaming Architecture

The key goal of a typical streaming architecture it to transmit data to or from an instrument fast enough to maintain a continuous generation or acquisition. When performing a generation task, the host computer retrieves data from memory and passes it through the communication bus to the instrument. The instrument than generates a physical signal from the data. The acquisition task works in the opposite direction, by taking the data produced by the instrument and transmitting it through the bus to the host computer to place into memory. Depending upon what technology is chosen for the basic components and bus interfaces, multiple elements can introduce throughput bottlenecks into the system, reducing your effective streaming rate.


Figure 1. Evaluate each interface in the system to maximize the streaming capabilities to meet application needs, such storing and playing back recorded RF data.    

Advancements of Streaming Architectures

The PXI-1 standard uses the Peripheral Component Interconnect (PCI) bus as a means to exchange data from PXI modules in the chassis to the PXI controller. PCI is a parallel bus that in its most common implementation is 32-bits wide with a frequency of 33 MHz. Data that is acquired by a PXI module is transferred from onboard device memory across the PCI bus, through the I/O controller, across the internal bus and into system memory (RAM). It can then be transferred from system memory, across the internal bus, onto a hard drive(s). Data that is generated by a PXI module follows the opposite path.


Figure 2. Data-streaming architecture of a PCI-based system, implemented between the PXI embedded controller and chassis.

Based upon specifications, the theoretical maximum bandwidth of the PCI bus is 132 MB/s, which translates to 110 MBytes/s of sustainable practical throughput. Since there is only a single link for all the PCI devices to transfer data to and from the host controller, the 110 MB/s of practical bandwidth would be shared across all the devices. Thus, in a PXI system, all the modules in a PXI chassis would share the PCI bus bandwidth. As the performance capabilities of PXI instrumentation are increasing, applications are evolving, and the amount of data that needs to be passed between the modules and the controller continues to increase. In these applications the throughput capabilities of the PCI bus would quickly be exceeded.

PCI Express, an evolution of the PCI bus, maintains software compatibility with PCI but replaces the parallel bus with a high-speed (2.5 Gbits/s) serial bus. PCI Express sends data through differential signal pairs called lanes, which offer 250 MBytes/s of bandwidth per direction per lane. Multiple lanes can be grouped together to form links with typical link widths of x1 (pronounced "by one"), x4, x8, and x16. A x16 Gen1 link provides 4 GBytes/s bandwidth per direction. Moreover, unlike PCI, which shares bandwidth with all devices on the bus, each PCI Express device receives dedicated bandwidth. This allows more number of PXI modules to continuously stream data to and from the embedded controller.


Figure 3. Links are defined by the number of lanes in the group, and are annotated “by N” where N is the number of lanes.  For example, PCI Express Gen1 lanes support 250 MB/s compared to PCI Express Gen2 lanes which supports 500 MB/s.

PXI Express chassis are able to accommodate PXI or PXI Express modules, and therefore depending upon the application can easily adapt. However, as instrumentation continues to advance in capabilities the bus technology continues to evolve to provide even more bandwidth capabilities with the announcement of PCI Express 2.0 specification (also known as PCI Express Gen2). The PCI Express Gen2 specification doubles the data transfer rate from Gen1 by doubling the bus bit rate from 2.5 GT/s to 5.0 GT/s while keeping full backward hardware and software compatibility to PCI Express Gen1. PXI Express continuously takes advantage of the latest PCI Express advancements.

For example, the NI PXIe-8133 embedded controller uses PCI Express 2.0 advancements to offer four x4 Gen 2 PCI Express links for interfacing to the PXI chassis backplane. The PXIe-8133 embedded controller offers up to 6.4 GB/s of total system bandwidth, which is twice that the previous generation embedded controller that was based on PCI Express Gen1 links provided.


Figure 4. Taking advantage of PCI Express Gen2, applications can simultaneously stream a larger set of I/O channels, giving the ability to create larger and more complex data record-and-playback applications.

The PXIe-8133 embedded controller connects a x16 Gen2 link from the processor to an onboard PCIe switch. From the onboard PCIe switch that provides four x4 PCIe links to the PXIe-1075 chassis providing 6.4 GB/s of bandwidth.  By taking advantage of the latest processor technology, the memory controller on the PXIe-8133 interfaces to two channels of DDR3 1333 MHz DRAM and provides a total memory throughput of 8 GB/s. Through the enabling PCIe Gen2 technology and memory capabilities, the overall system bandwidth increases. With this configuration, pairing the PXIe-8133 with the PXIe-1075 chassis fully exercising the bandwidth of the chassis and the total system bandwidth is 6.4 GB/s. With this architecture, the combination of the chassis and embedded controller now matches the appropriate bandwidth capabilities for the chassis, and can enable even more data throughput as chassis designs evolve.

Peer-to-Peer Streaming Architectures

Some high-throughput applications have stringent requirements for communication latency and data bandwidth capabilities. For these types of applications, they can take advantage of the latest peer-to-peer streaming technology. NI peer-to-peer (P2P) streaming technology uses PCI Express to enable direct, point-to-point transfers between multiple instruments without sending data through the host processor or memory. This enables devices in a system to share information without burdening other system resources.


Figure 5. With peer-to-peer technology, data packets are able to bypass the system host controller memory to enable the ability to stream data directly between devices in a deterministic manner.

Optimizing the Streaming Architecture

Overall, for applications requiring high data streaming, it is important to look at the entire data transfer architecture as a whole, from module to backplane to controller in order to have a complete understanding of the capability of the platform. In the previous streaming architecture options, the PCI Express links designed into the NI PXI Express embedded controller support enough bandwidth to transfer maximum data from the chassis.  When evaluating an alternate vendor design, the choices made by the vendor could limit the overall system bandwidth capabilities.


Figure 6. When evaluating the options for an embedded controller and chassis, evaluate the architecture design choices to minimize system bottlenecks.

For example, in Figure 7, the PXI Express embedded controller from another vendor restricts the amount of bandwidth between the chassis and the processor due to use of a PCI Express x8 Gen1 link from the CPU to the onboard PCIe switch. Although the rest of the system could handle up to 8 GB/s of streaming, the architecture choice for the embedded controller limits the overall total system bandwidth to 2.0 GB/s.  As such, in designing streaming systems, ensure that the system components can work together to achieve their individual bandwidth specification levels.

In comparison, the latest PXI Express platform products from National Instruments, the chassis and embedded controller have been optimized with these high throughput use cases in mind. The PXIe-1085 chassis takes advantage of the latest PCI Express Gen3 technology and when paired with the PXIe-8880 that takes advantage of the Intel Xeon 8-core processor. The NI PXIe-1085 offers all hybrid slot capabilities to adapt to a mix of PXI or PXI Express modules. Overall this chassis provides high-performance capabilities combined with flexibility for test and measurement applications.


Figure 7. Through advancements of PC bus technology, PXI Express systems continuously evolve data bandwidth capabilities to meet the latest test application needs.

The PXIe-8880 embedded controller has a x16 and a x8 PCIe Gen 3 links to the PXIe-1085 chassis providing 25.6 GB/s of bandwidth.  By taking advantage of the latest processor technology, the memory controller on the PXIe-8880 interfaces to three channels of DDR4 1866 MHz DRAM and provides a total memory throughput of 30 GB/s. Through the enabling PCIe Gen 3 technology and memory capabilities, the overall system bandwidth doubles compared to previous generation NI configurations and is three times higher than products from alternative vendors. The combination of these PXI Express platform products enables more high performance instrumentation streaming at their maximum rate to be combined in a single chassis.

Conclusion

Streaming performance for an application is based upon several factors in the system. Evaluating each connection in a streaming architecture is important to maximize the overall system throughput. Using the latest in PXI Express technologies, the system bandwidth can support up to 25.6 GB/s. These bandwidth capabilities combined with the chassis flexibility to accept a variety of PXI or PXI Express modules can enable a vast array of applications and future proof for evolving instrumentation.

Next Steps: