Streaming Data to and from Disk


Streaming is the act of transferring data directly to or from memory. This memory can be the onboard memory of the instrument, the RAM of the controller, or the hard drive of the controller. The rate at which data is transferred to these various types of memory is limited by several factors, from the system’s bus bandwidth to the read/write speed of the memory media. This document discusses the different streaming buses and memory types, as well as provides general information about streaming and how it is implemented on the different buses and memory mediums.


Storage Media


Integrated Drive Electronics (IDE) was the original bus standard for hard drives and storage media.  Originally developed by Western Digital, ATA, and APTI in 1986, it was dubbed Advanced Technology Attachment (ATA).  As mass storage devices became larger, the standard could no longer handle the physical size of drives.  Thus, Enhanced IDE (EIDE) was born allowing drives up to 8.4 GB in size.  At the same time, this standard extended its ability to allow other device such as CD and DVD-ROMS, tape drives, and large-capacity floppy.  Hard drives continued to become larger and thus the ATA standard continued to be revised to accommodate 137 GB then 160 GB, and now 128 PiB (or 144 Petabytes).  Since the introduction of Serial ATA (SATA), the names IDE, EIDE, and ATA are all synonymous and interchangeable to Parallel ATA (PATA).

Serial ATA

Serial Advanced Technology Attachment (SATA) was introduced in 2003 to alleviate the burden of constantly refining PATA’s dated technology.  In addition, this new standard introduced a number of new features unavailable to PATA.  One large difference in SATA is the use of Low Voltage Differential Signaling (LVDS), allowing signaling rates of 1.5 Gbit/s and higher.  The data is encoded using the 8B/10B standard similar to that used in Ethernet, Fiber and PCI Express.  Since the 8B/10B standard is 80% efficient, actual realized transfer-rate is 1.2 Gbits/s (150 MB/s) for the first introduction of SATA, SATA/150 (SATA at 150 MB/s). 

SATA is currently a non-shared bus interface (To change with SATA 6.0 Gbits/s).  Therefore it is one bus, one drive allowing for dedicated bandwidth.  Also, with certain implementations, SATA interface allows drives to be hot-swappable.  After SATA/150 was released, a number of shortcomings soon revealed themselves.  The first and most important was SATA’s emulation of the PATA interface during the transition period.  PATA could only handle 1 pending transaction at a time while SCSI had long benefited from its support of multiple outstanding requests, enabling the controller to re-order requests to optimize response-time.  Native Command Queuing (NCQ) was introduced as an alternative to Native SATA (non-PATA emulated drives), both SATA/150 and SATA/300.  SATA/300 is simply a doubled signaling rate to obtain 3 Gbit/s, allowing transfer rates of 300 MB/s burst-throughput;  a response to the idea that SATA/150 is hardly faster than ATA/133, even though the fastest desktop hard drives barely saturate the SATA/150 link.  Serial ATA 6.0 Gbit/s is in the current road map to allow for port multipliers, allowing the bus to be shared or perhaps enabling the use of solid-state drives such as RAM disks.

RAID Systems

Redundant Array of Inexpensive Disks (RAID), also known as Redundant Array of Independent Drives, is a general term for mass storage schemes that split or replicate data across multiple hard drives.  Essentially, RAID combines a number of hard disks into one logical unit through hardware or software.  A number of different RAID configurations have appeared, but the main levels, RAID 1-5, will be discussed here:

  1. RAID 0: Striped Set without Parity (Requires Minimum 2 Disks) – Data is equally divided into fragments across a number of disks.  This multiplies the read and write speed by the number of hard drives present in the system (with a little loss to overhead) since one portion of data is equally split and written to or read from all disks at the same time.  While dramatically increasing performance, if any one drive fails, the entire array is corrupted and cannot be recovered.
  2. RAID 1: Mirrored Set (Requires Even Number, Minimum 2 Disks) – Data is mirrored across drives.  Increased read performance with slightly decreased write.  Operates as long as one drive is functioning.
  3. RAID 2: Striped data at the bit (rather than block) level – The disks are synchronized by the controller to spin in perfect tandem. Extremely high data transfer rates are possible. This is the only original level of RAID that is not currently used.RAID 2 is the only standard RAID level which can automatically recover accurate data from corrupt data. Other RAID levels can detect corrupt data, or can reconstruct missing data, but cannot reliably resolve contradictions between parity bits and data bits without human intervention.
  4. RAID 3 and RAID 4:  Striped Set (Requires Minimum 3 Disks) with Dedicated Parity – Similar to RAID 0 but one extra drive is dedicated for Parity.  Maximum write performance is determined by the ability to write the parity data to the single drive.  Read performance is equal to RAID 0.  Fault tolerant to 1 drive.
  5. RAID 5:  Striped Set (Requires Minimum 3 Disks) with Distributed Parity – Similar to RAID 3 and RAID 4, but parity is rotated through all disks allowing reconstruction should any one drive fail.  Performance increase is just below RAID 0 but increased fault protection makes RAID 5 an ideal choice over RAID 0.

Figure 1:  Striping Data Across Multiple Disks By Using RAID

Bus Architectures


Peripheral Component Interconnect (PCI) is a standard for attaching peripheral devices to a motherboard.  The two forms of PCI are integrated circuit and planar device:  fixed to the mother board or via expansion card that fits into a socket.  With widespread introduction in 1993 by Intel, PCI 2.0 was the replacement for the Industry Standard Architecture (ISA) bus.  Improvements included faster bus, smaller architecture, and less power consumption.  All devices communicate in parallel to the host, thus rendering it a shared bus.  Later revisions included the 66 MHz bus speed and a 133 MHz bus.


PCI eXtensions for Instrumentation (PXI) standard is a modular instrumentation platform introduced by National Instruments in 1997.  PXI is based on the CompactPCI (cPCI) standard, which is a 3U or 6U Eurocard-based industrial computer.  The heart of PXI and cPCI is the backplane, which is a PCI bus.  Typically, a PXI system is 4U high and half or full rack width.


PCI Express (PCIe) 1.1 was introduced in 2004 by Intel.  Designed to replace the PCI bus, the PCIe cards are not backwards compatible with the PCI standard.  Major differences between PCIe and PCI is that PCIe is a dedicated serial bus.  Each PCIe device communicates serially with full bandwidth potential to the host computer using lanes.  Each lane carries 250 MB/s in full duplex.  With a maximum of 16 lanes, the number of lanes is denoted by an x (i.e. x16, pronounced “by 16”) resulting in 8 GB/s of theoretical bandwidth.  Serial communication is much faster in throughput because bits arrive at the destination in order and the order across multiple lanes does not have to be synchronized since they may be reconstructed later on.  In parallel communication, bits may arrive at different times and therefore sufficient time must be given to allow all bits to be realized.  If data is latched before all bits arrive, that portion of data is permanently lost.  PCIe uses the common 8B/10B encoding scheme.  In January 2007, PCIe 2.0 was released.  The standard effectively doubled the throughput of each individual lane to 500 MB/s resulting in 16 GB/s in x16 mode while still retaining backward compatibility to PCIe 1.1.


PCI eXtensions for Instrumentation Express (PXIe) standard was introduced in August of 2005 after the standard for PCI Express was released.  Unlike PCI and PCIe’s lack of backwards compatibility, a PXIe Chassis is capable of hosting any PXI cards in the hybrid slots.

Streaming Definition

The majority of acquisitions performed with stand-alone instruments are finite. The duration of the acquisition is dictated by the amount of onboard memory available in the instrument.  After the acquisition is complete, the data is transferred to the controlling PC via Ethernet, or more commonly, GPIB. Consider a case where data is sampled at 1GS/s.  If the device has 256 MB of onboard memory per channel, the memory would be full and end out acquisition after about 250 ms.  If the instrument interfaces using the GPIB bus (which has a bandwidth of about 1 MB/s), the user must wait almost 4 ½ minutes (250 s) for this data to be transferred to the computer for analysis.  Now compare this to an NI instrument with the same sampling rate and onboard memory.  The same data transfer would take fewer than 3 seconds over the high-bandwidth PCI/PXI bus, a more than 80x improvement!  The PCI Express/PXI Express bus enables even faster data transfers. 

Now, let’s take this example a step further.  Implementing streaming from the instrument, through the controller, and onto hard disk increases the available memory of the instrument from megabytes to terabytes.  By utilizing the high-bandwidth PXI and PXIe bus architectures, data can stream to and from hard disk at a rate high enough to support the instrumentation.  This means that oscilloscopes can acquire data and store it directly to disk, while arbitrary waveform generators can pull data directly from disk, bypassing the previously-limiting onboard memory.

In the past, engineers have used a technique known as “triggering” to overcome memory limitations.  This involves waiting for a specific event, known as a “trigger”, to occur before beginning acquisition.  However, once acquisition begins, the application is still limited by the amount of memory available.  With streaming data to disk, the amount of memory available is virtually infinite, making triggering unnecessary.  Now engineers can collect all of the data and see what happened leading up to the event in question, as well as the after-effects of the event.  This could also lead to post-processing of data for multiple events separated by long periods of time.  Data streaming enables faster sampling rates over longer testing periods than ever before.

Streaming Data with RAID

One of the bottlenecks for data streaming is the read/write speed of the actual hard drive.  As data is being written to a hard drive, the I/O head of the disk limits the rate at which data can be transferred to the physical disk.  This factor is the main limitation to streaming rates with single hard drive systems.

By using a RAID-0 hard drive array, the data throughput to disk can be improved by striping the data over multiple drives.  This separates the read/write operations and allows them to occur in parallel.  Therefore, in addition to allowing us to combine multiple hard drives together to increase overall memory size, RAID-0 also increases overall data throughput speed.

Streaming Data with the PXI Bus

PCI is a parallel bus. The most common implementation, and that which is used in PXI, is 32-bits wide with a 33 MHz clock. This results in a theoretical maximum bandwidth of 132 MB/s (approximately 110 MBytes/s can be sustained). Because the bus is parallel, all of the devices on the bus share its bandwidth. Data that is acquired by a PCI device is transferred from onboard device memory across the PCI bus, through the PCI controller, across the I/O bus, and into system memory (RAM). It can then be transferred from system memory, across the I/O bus, onto a hard drive(s). The CPU is responsible for managing this process. Data that is generated by a PCI device follows the opposite path. Additionally, peer-to-peer data streaming between two devices on the PCI bus is possible.

Streaming Data with the PXIe Bus

PCI Express, an evolution of the PCI bus, maintains software compatibility with PCI but replaces the parallel bus with a high-speed (2.5 Gbits/s) serial bus. PCI Express sends data through differential signal pairs called lanes, which offer 250 MBytes/s of bandwidth per direction per lane. Multiple lanes can be grouped together to form links with typical link widths of x1 (pronounced "by one"), x4, x8, and x16. A x16 link provides 4 GBytes/s bandwidth per direction. Moreover, unlike PCI, which shares bandwidth with all devices on the bus, each PCI Express device receives dedicated bandwidth. PXI Express supports up to 6 GBytes/s of total system bandwidth (controller to backplane) and up to 2 GBytes/s of dedicated bandwidth per slot (backplane to module).

Data that is acquired by a PCI Express device is transferred from onboard device memory across a dedicated PCI Express link, across the I/O bus, and into system memory. It can then be transferred from system memory, across the I/O bus, onto a hard drive(s). The CPU is responsible for managing this process. Data that is generated by a PCI Express device follows the opposite path. Peer-to-peer data streaming is also possible between two PCI Express devices.

Streaming Benchmarks

 Hard Drive(s)

 Rate(s) (MB/s) write/read

 2.5" PATA  30 (Fujitsu 40GB, 5400 RPM)
 3.5" PATA  57 (WD 160GB, 7200 RPM)
 3.5" SATA  62 (WD 160GB, 7200 RPM)
 3.5" SATA  70 (Seagate Barracuda 7200.10 250GB, 7200 RPM)
 2-drive RAID-0  114/127 (PXI-8351)
 2-drive RAID-0  117/114 (ExpressCard to eSATA Thecus RAID box)
 4-drive RAID-0  120/123 (ExpressCard to eSATA Addonics RAID box)
 4-drive RAID-0  243/240 (PCIe x4 HighPoint RocketRAID 2320)
 8-drive RAID-0  448/439 (PCIe x4 HighPoint RocketRAID 2320)
 8-drive RAID-0  370/374 (PCIe x4 Promise RAID box)
 12-drive RAID-0  600+/700+ (PCIe x8 with HighPoint RocketRAID 2340)

*Hard drive streaming only -- no instrument I/O was being performed

*These rates require the streaming techniques mentioned in this document

Benchmarking Conclusions
  • 3.5” hard drives perform better than 2.5” drives
  • SATA data rates do not differ significantly from PATA
  • To increase streaming speed, hard drive speed makes the biggest difference

Streaming Caveats

In order to achieve the streaming rates mentioned in this document, several limitations of data streaming must be kept in mind.  Two of these limitations address the hard drive, and the other two refer to limitations of the PCI/PXI bus architecture.

Hard Disk File Location:  Outer Rim vs. Inner Rim

When performing file operations on a hard drive, faster data rates can be achieved if the data is being accessed on the outer rim of the hard drive.  As shown in the following figure, data transfer rates can decrease drastically as the I/O head moves inward on the disk.  Therefore, in order to achieve maximum data transfer rates, streaming files should be located on the outer rim of the disk. 

One issue with this limitation is the fact that Windows begins allocating memory from the outer rim inwards.  Therefore, in order to reserve the outer rim of the disk for streaming file I/O, the disk should be partitioned such that the operating system is placed on the inner rim.

PCI bridges

In order to provide for additional PCI device connections, a PCI bridge is often used.  This bridge is a connection between multiple PCI buses.  However, PCI bridges can have a significant effect on data streaming rates.  For reading data, the rate decrease is minimal.  However, for output operations, a PCI bridge can decrease data rates as much as 20-40% per bridge.  Keep in mind that larger PXI chassis also implement PCI bridges in order to accommodate additional module slots, therefore any streaming output devices should be placed in the first few slots of the PXI chassis in order to achieve maximum streaming data output rates.

PXI card in PXIe chassis

One of the benefits of the PXIe bus is its backwards compatibility with PXI modules.  However, when placing a PXI card in a PXIe hybrid chassis, a PCIe to PCI bridge is used.  Therefore, this bridge will have a detrimental effect on data streaming rates.  In order to achieve maximum streaming rates with PXI modules, they should be used within a PXI chassis.

Win32 File I/O and buffering

When using LabVIEW to implement data streaming, the best throughput rates can be achieved by using the Win32 file I/O Vis.  These Vis implement much more efficient file I/O operations than any other LabVIEW methodology.  One of the main benefits of these low-level functions is the ability to turn off the file buffering that Windows implements by default.  This buffering significantly reduces streaming data throughput to hard disk.  Therefore, using the Win32 file I/O Vis with buffering disabled will allow the highest data streaming rates with LabVIEW.