Parallelism has changed the way engineers and scientists approach problems and find new ways to innovate. From increasing the performance of hardware inputs and outputs (I/O) to maximizing the benefits of multicore processors, data acquisition systems continue to grow more advanced, taking PC-based measurements into new applications. NI LabVIEW 2009 software and NI X Series multifunction data acquisition (DAQ) hardware are the next generation of products to extend functionality with parallel operation from signal to software.
2. Parallel Programming in LabVIEW 2009
As a graphical programming language, LabVIEW provides a unique approach to parallel programming with visual block diagrams and dataflow execution. LabVIEW 2009 adds new features and capabilities to make the most of multicore processor systems and maximize software performance. Some techniques in parallel programming are more conceptual, like pipelining and data parallelism, while others depend on specific LabVIEW structures, such as timed loops and parallel for loops.
Pipelining and Data Parallelism
Pipelining is similar to an assembly line. Consider this approach in streaming applications and anytime you must modify a CPU-intensive algorithm in sequence, where each step takes considerable time.
Figure 1. Sequential Stages of an Algorithm
Like an assembly line, each stage focuses on one unit of work. Each result passes to the next stage until the final stage.
To apply a pipelining strategy to an application you want to run on a multicore processor, break the algorithm into steps that have roughly the same unit of work and run each step on a separate core. The algorithm can repeat on multiple sets of data or on data that streams continuously.
Figure 2. Pipelined Approach
The key is to break up the algorithm into steps that take equal time because each iteration takes as long as the longest individual step in the overall process. For example, if step 2 takes one minute to complete but steps 1, 3, and 4 each take 10 seconds, the entire iteration takes one minute to complete. Caveats to this technique arise when data falls out of cache or when the penalty for intercore communication exceeds the gain in performance.
The LabVIEW block diagram in Figure 3 shows an example of the pipelining approach. The black-bordered for loops contain stages S1, S2, S3, and S4, which represent the functions in the algorithm that must run in sequence. Because LabVIEW is a dataflow language, the output of each function passes along the wire to the input of the next.
Figure 3. Pipelined Approach in LabVIEW
A Feedback Node appears as an arrow above a small dot. Feedback Nodes denote a separation of the functions into separate pipeline stages. A nonpipelined version of the same code looks similar but without the Feedback Nodes. A common example that benefits from this technique is streaming applications where fast Fourier transforms (FFTs) require manipulation one step at a time.
You can apply data parallelism to large data sets by splitting up a large array or matrix into subsets, performing the operation, and combining the result. First, consider the sequential implementation, where a single CPU attempts to process the entire data set.
Figure 4. A Single CPU Processing a Large Data Set
Now consider the Figure 5 example of the same data set split into four parts. You can spread this data set across the available cores to achieve a significant increase in speed.
Figure 5. Multiple Cores Using Data Parallelism
Timed-Loop and Parallel For-Loop Structures in LabVIEW
The timed-loop structure in LabVIEW acts as a while loop but with special characteristics that can help you optimize performance based on the multicore hardware configuration. For example, any code enclosed within the loop executes in a single thread. In multicore systems, you can assign the loop to a particular processor core, which is useful for simultaneously executing graphical code within parallel timed loops.
Figure 6. The timed-loop structure in LabVIEW can execute on an assigned processor core.
The new parallel for-loop structure in LabVIEW 2009 divides for-loop processing between your processor cores by enabling parallelism. This is used for code that does not have dependencies from one iteration to the next, and is configured by right-clicking on the loop border of a regular for loop and selecting “Configure Iteration Parallelism…” (Figure 7).
Figure 7. Turn any for loop into a parallel for loop by right-clicking on the structure border.
A new P terminal (for Parallelism enabled) appears on the border of the loop, where you can enter a number of parallel loop instances.
Figure 8. Parallel For Loop Enabled with Four Parallel Instances
The parallel for loop is a valid approach for an intensive operation that needs to execute over and over in a loop. However, if the code has dependencies, you should not use the parallel for loop because the dependencies imply that you should execute the algorithm sequentially. In that case, consider another technique, such as the previously discussed pipelining, to achieve parallelism.
3. Parallel I/O Timing with X Series Multifunction DAQ
In addition to the software parallelism advancements incorporated into LabVIEW 2009, hardware parallelism has improved with the new NI X Series multifunction DAQ devices. With the parallel timing and triggering capabilities of NI-STC3 technology on PCI Express and PXI Express, X Series offers a new level of excellence in multifunction I/O.
Figure 9. PCI Express X Series Multifunction DAQ
NI-STC3 Timing and Synchronization Technology
All multifunction data acquisition hardware requires timing circuitry to control analog-to-digital converters (ADCs), digital-to-analog converters (DACs), digital I/O lines, and counters. The new X Series uses the latest version of National Instruments timing technology, called NI-STC3, which incorporates the hardware parallelism of I/O operation.
NI-STC3 features include the following:
- Four parallel counters with native PWM and encoder support
- Independent timing and synchronization engines
- Better channel timing and triggering with a 100 MHz timebase
Counters are the most versatile components on any data acquisition device. You can use them for simple edge counting, frequency measurements, timing, pulse generation, interfacing with encoders, modulating power with PWM output, and even generating sample clocks and triggers for other types of board I/O. You can implement all of these counter task examples with hardware-timed speed and reliability because counters execute with very little software involvement. Due to this combination of diversity and performance, it was fairly common to run out of counters sooner than expected when building a data acquisition system. New NI-STC3 technology takes counter channels one step further, offering four 100 MHz counters with 32-bit resolution. Not only are there four higher-speed counters on a single X Series device, but you can now accomplish operations that previously required two counters with only one. To control a stepper motor in the past, for example, you often needed to generate a finite number of digital pulses, which involved using one counter to continuously generate pulses and a second counter to gate pulses being sent to the motor. New NI-STC3 counters can accomplish this task with a single counter channel, leaving the remaining counters free for you to do more with a single device.
Advanced timing and triggering functionality on multifunction data acquisition devices has often relied on onboard counters and complex signal routing to achieve specialized hardware-timed performance. With NI-STC3 technology, X Series multifunction DAQ devices offer parallel timing engines to each group of I/O on a multifunction device. Analog input channels, analog output channels, digital I/O lines, and each of the counters can now function with independent timing capabilities. In the past, you needed to correlate digital I/O lines to another timing engine for hardware-timed digital waveforms. But now they can work in parallel with each other group of I/O, offering new capabilities with a single multifunction data acquisition device. Another example of improved timing is a retriggerable acquisition, which involves waiting for a trigger condition to be met, taking a finite number of samples, and then immediately rearming the trigger for the next acquisition. Using driver software function calls to rearm the trigger risks missing the next trigger due to software latency; therefore, the best possible performance requires a hardware-timed approach. In the past, counters were the only way to implement hardware-timed retriggering, so counters were used to generate a retriggerable pulse train, which was then internally routed to act as the analog input sample clock. With NI-STC3 technology on new X Series DAQ devices, analog channels no longer require the use of counters to implement retriggerable acquisitions, and triggers can independently rearm themselves without software intervention.
The onboard timebase of any data acquisition device acts as the internal heartbeat that drives all digital circuitry. From sample clocks to trigger lines, everything uses the timebase as an onboard reference to generate clock frequencies and latch digital edges. The timebase frequency determines how close the requested sampling rate gets to the actual sampling rate. Devices based on NI-STC2 incorporate a 20 MHz timebase to derive analog and digital sample clocks, using 50 ns increments to get as close to the user-specified frequency as possible. NI-STC3 technology uses a new 100 MHz timebase for all analog and digital timing, which means that sampling frequencies are five times more accurate with 10 ns precision. Analog triggering also uses the onboard timebase to start the sample clock once the trigger condition has been met. Triggers can occur at any point between timebase clock cycles, and the new 100 MHz timebase reduces the maximum delay from 50 to 10 ns. This is particularly important for applications that need fast reaction times to trigger events, such as crash testing and transient recording.
4. Go Parallel from Signal to Software
Parallel programming in a graphical development environment is naturally better for running code on multicore processors, and LabVIEW 2009 introduces new features to maximize the computing power of your multicore system. New NI-STC3 technology on X Series multifunction DAQ devices takes PC-based data acquisition to new levels of functionality and performance with four enhanced counters, independent timing engines, and a new 100 MHz timebase. You can take advantage of parallel technology to build a software-defined measurement system that meets your unique application needs.