NI LabVIEW 2010 Performance Details

Publish Date: Oct 04, 2011 | 3 Ratings | 4.67 out of 5 |  PDF

Overview

Each new version of NI LabVIEW software provides the latest features to make you more productive, offers access to new technologies, and delivers fixes to existing issues in the product. All of this affects the LabVIEW compiler, and can cause the machine code that the compiler generates to change. As a consequence, the edit-time performance of the LabVIEW development environment and run-time performance of your LabVIEW applications vary from version to version. This document examines key changes in LabVIEW 2010, the results you can expect from these changes, and methods to improve the performance of your LabVIEW applications.

Table of Contents

  1. The LabVIEW Compiler
  2. Overview of Changes in LabVIEW 2010
  3. What Changes Can You Make? 

1. The LabVIEW Compiler

The LabVIEW compiler is the key LabVIEW feature so many engineers use to improve their productivity. The compiler helps to abstract typically complex tasks such as memory allocation and thread management while providing edit-time feedback to the user when block diagram code is not executable. Over the lifetime of LabVIEW, the compiler hierarchy has significantly evolved. In fact, the first LabVIEW compiler wasn’t even a compiler.  LabVIEW 1.0 used an interpreter to execute code for the Motorola 68000-based Macintosh. It lacked data copy reduction and polymorphism, used all extended floats, and implemented local type checking. NI incorporated inplaceness in LabVIEW 1.1 and added compiler features such as type propagation, clumping algorithms, and a linker in LabVIEW 2.0. Over the years, NI continually improved the compiler to make it “smarter” so it could support different processor architectures like the x86 and PowerPC. Concepts such as attribute nodes, global and local variables, and type definitions were introduced in LabVIEW 3.0, with virtual registers and the SmartHeap memory managers following in LabVIEW 3.1. LabVIEW 4.0 brought profiling and debugging and started optimizing code with constant folding. This trend continued through LabVIEW 7.1.

It is a stretch to call the LabVIEW code generator before LabVIEW 8.0 an optimizing compiler because no real changes or optimizations were being made to the generated code itself. In fact, all of the code generation was ad hoc and used hard-coded registers. LabVIEW 8.0 introduced a streaming intermediate language, virtual registers, and a simple register allocator. LabVIEW 2009 took a major step forward by introducing the dataflow intermediate representation (DFIR), a framework for transformations that occur on a dataflow graph generated from the block diagram. LabVIEW 2010 goes a step further by adding a low-level virtual machine (LLVM)-based back end, enabling a whole suite of standard compiler optimizations. Figure 1 illustrates the compile chain in LabVIEW 2010.

Figure 1. LabVIEW Compiler Hierarchy

The DFIR is a high-level representation that preserves dataflow, parallelism, and LabVIEW execution system semantics. The LLVM is lower-level and sequential, with no concept of parallelism. However, the LLVM provides knowledge of target machine characteristics, instruction sets, alignment, and so on. With these two new additions, the LabVIEW compiler hierarchy now has two new “layers” where you can implement optimizations. Some optimizations, or transforms, such as parallel For Loop decomposition or subVI inlining, can be conducted only in the DFIR. Other transforms, such as register allocation, loop strength reduction, and basic block ordering, can be implemented only in the LLVM. However, there are transforms that make sense in both representations, such as loop invariant code motion or dead code elimination. However you slice it, the DFIR and LLVM are new additions to the LabVIEW compiler hierarchy that complement each other and allow the compiler to optimize your code for faster run-time execution, without requiring you to make any changes.

For a more detailed discussion of the LabVIEW compiler, read LabVIEW Compiler: Under the Hood.

Back to Top

2. Overview of Changes in LabVIEW 2010

With every new version of LabVIEW, the performance of LabVIEW applications is affected in several ways including load time, run time, memory usage, and size on disk, just to name a few. Whether it’s a new feature, bug fix, or compiler enhancement, these changes result in LabVIEW code that may run faster or slower with each release. The following sections explore the speedups and slowdowns that you can expect in LabVIEW 2010.

LabVIEW 2010 Focus Area: Run-Time Performance

One of the primary focuses for the 2010 release of LabVIEW was VI run-time performance – to improve how quickly your VI executes in 2010 relative to 2009 without your having to change the VI block diagram itself. The next section provides an overview of the optimizations that deliver this increased performance along with the benchmarks that were used internally at National Instruments to validate the findings.

Custom Optimizations

The addition of the DFIR and LLVM to the LabVIEW compiler hierarchy exposes new layers where you can make the block diagram that you write operate more efficiently without changing the way you program. This happens in two ways: 1) a lowering transform that decomposes the code into a more granular representation to enable more optimization, or 2) an optimization that actually changes the representation of the code to operate more efficiently.

Transforms and Optimizations

At the highest level, an optimization is either the rewriting or restructuring of the dataflow graph to provide better performance. Operations such as eliminating unnecessary code, reordering, or minimizing overhead are common optimizations. Savvy programmers, themselves, can implement many of these, but if you have this functionality built into the compiler, you don’t need to be an expert programmer to achieve good performance. Additionally, many of these optimizations lead to opportunities for more optimizations, and automating the process produces optimal results.

Common Subexpression Elimination

Many LabVIEW applications feature numerical calculations that are used in different places throughout the diagram. As a programmer, it is much simpler to find the square root of 2 in multiple places rather than wiring the original calculation across a large diagram. However, reusing the initial calculation can save you time and memory. The compiler can identify these situations and optimize the underlying code for you, so you can simplify the diagram and keep it optimally readable. Figure 2 shows an example of using block diagrams.

                Figure 2. The common subexpression elimination decomposition can reuse values stored in memory even if the same LabVIEW wire isn’t reused.

Constant Folder

Although the algorithm itself is highly complicated, the constant folder essentially can identify when a wire value on the block diagram comprises only constant inputs, which means for that compile, the generated value will never change. In LabVIEW, you can turn on “fuzzy wires” to indicate when constant folding is occurring. In the Tools»Options menu, the Block Diagram section has an option to show the constant folding of wires and structures:

      

  Figure 3. Constant Folding in a LabVIEW Block Diagram

Dead Code Elimination and Unreachable Code Elimination Transforms

Code that is unnecessary is dead code.  Removing dead code speeds up execution time because the removed code is no longer executed. Dead code is usually produced through the manipulation of the DFIR graph by other transforms rather than written by users directly.  As an example, consider the following VI.  Unreachable code elimination determines that you can remove the case structure.  This “creates” dead code for the dead code elimination transform to remove.

Figure 4. Original Code before any Optimizations

Figure 5. Unreachable code elimination can remove code that is never executed.

Figure 6. Dead code elimination can eliminate code that is unnecessary.

Most of the transforms covered in this section have interrelationships like this; running one transform may uncover opportunities for other transforms to run.

 Loop Invariant Code Motion Transform 

Loop invariant code motion identifies code inside the body of a loop that can be safely moved outside. Because the moved code is executed fewer times, overall execution speed improves.

Figure 7. Before Loop Invariant Code Motion       

Figure 8. After Loop Invariant Code Motion 

In this case, you can move the increment operation outside the loop.  The loop body remains to build the array, but the calculation doesn't need to be repeated in each iteration.

Loop Unroll Transform

Loop unrolling reduces loop overhead by repeating a loop’s body multiple times in the generated code and reducing the total iteration count by the same factor.  This exposes opportunities for further optimizations at the expense of some increase in code size.

Benchmarks

For each release of LabVIEW, the LabVIEW development team maintains an entire farm of test computers running daily benchmarks for run-time performance. These benchmarks are largely computational tests focusing on the execution of LabVIEW block diagrams. The aggregate performance number across these tests shows an average 35 percent speedup from LabVIEW 2009 to LabVIEW 2010. Of course, that does not mean that every LabVIEW application sees a 35 percent speedup. It is not a trivial task to quantify LabVIEW performance. The typical LabVIEW application involves several diverse components. Whether it is a hardware device driver, DLL call, UI component, or standard G code, the performance of an application is dependent on each component. To better represent how a typical application performs in LabVIEW 2010, National Instruments put together a suite of real-world applications.

VI Performance Graph

The bins in Figure 9 represent LabVIEW 2010 performance in several of the specific use cases or industries where LabVIEW is used.

Figure 9. LabVIEW 2010 delivers significant run-time improvement compared to LabVIEW 2009.

Complex Math – Black-Scholes PDE Solver

The Black-Scholes model is a mathematical description of financial markets and derivative investment instruments. The Black-Scholes formula is the solution to the partial differential equations that describe this problem.

DAQ – Digital and Analog Waveform Reads 

This VI creates an NI-DAQmx task that waits for the card to transfer all of the data to memory.  It then measures the time it takes to scale the data and transfer it to LabVIEW as well as the overhead of your VIs. A second task is created to help the first because of restrictions for timed-digital acquisition on NI M Series devices.

Real-Time Math – MathScript Heat Equation

The MathScript Heat Equation is a shipping example included with the LabVIEW MathScript RT Module. This VI simulates the flow of heat across a thin plate from a single point source.  The VI solves a well-known elliptic partial differential equation.  It demonstrates using MathScript user-defined functions, the 3D scene graph, and mouse interaction with graph cursors.

Bit Manipulation – UnpackBits

This example is a bit-wise operation on a set of integer data types. Taken from the LabVIEW Modulation Toolkit, this example examines some lower-level primitives used in LabVIEW. Much of the speedup in this example is due to the new register allocation provided by the LLVM.

Real-Time Control – Advanced PID

This VI, available in the LabVIEW PID and Fuzzy Logic Toolkit, implements a PID controller using a PID algorithm with advanced optional features. The optional features include manual mode control with bumpless manual-to-automatic transitions, nonlinear integral action, two degrees-of-freedom control, and error-squared control. Use the DBL instance of this VI to implement a single control loop. Use the DBL Array instance to implement parallel multiloop control.

Real-Time Control – Single-Channel PID

This VI, also available in the LabVIEW PID and Fuzzy Logic Toolkit, implements a PID controller using a PID algorithm for simple PID applications or high-speed control applications that require an efficient algorithm. This PID algorithm features control output range limiting with integrator anti-windup and bumpless controller output for PID gain changes.

Parallel For Loop – Mandelbrot

Mandelbrot is a fractal equation that is a parallelizable operation that produces an unbalanced workload where the “black” areas take longer to compute than the “colorful” portions. The points in the set (black areas) take more calculations to compute than the points not in the set (colorful portions). In LabVIEW 2009, each loop instance computes one statically determined chunk of iterations. The work is not well-balanced between the instances because some of the loop instances work on rows with more points in the set. In LabVIEW 2010, each loop instance requests a chunk of iterations, computes those iterations, and then requests another chunk. The work is balanced between the instances because an instance that computes a chunk of rows with few points in the set can request another chunk of rows to help balance the load.

Large Array Math – Linear Scale (Multiply and Add)

This example performs a multiply operation followed by an add operation on a 4 million element array. Because of the SSE optimizations available in the LLVM, integer operations such as this speed up drastically.

Performance Caveats

Although the significant amount of improvements in the compiler will improve the performance of “G code,” many applications won’t observe a significant speedup. Almost all LabVIEW applications include some variety of hardware calls, built-in analysis, or UI components.

Interacting with Hardware

Although not always the limiting factor in the speed of an application, the performance of a LabVIEW program is often limited by the acquisition speed. This could be limited by the physical sampling rate setting or bottlenecked by a PC-based bus such as Ethernet, USB, or disk access. When this is the case, improving the speed of smaller portions of G code doesn’t affect the overall speed because it is still limited by the hardware interaction.

Call Library Node Calls

Many of the built-in analysis functions in LabVIEW, such as the Advanced Analysis Library (AAL), simply call into a compiled DLL using the Call Library Node. Because this calls a piece of code external to LabVIEW, they won’t benefit from the improvements to the LabVIEW compiler.

User Interface Components

LabVIEW components that control the user interface, such as controls, indicators, VI Server property, and invoke nodes, have no expected improvement in execution speed or update rate because they are not affected by the LabVIEW compiler.

Other LabVIEW 2010 Performance Metrics

With each new version, NI tracks several metrics to ensure that LabVIEW remains usable on common PCs. Many of these metrics show negligible changes from LabVIEW 2009, but a few were directly affected by the updates to optimize the LabVIEW compiler. This section discusses many of the remaining metrics used to track LabVIEW 2010 performance.

Compile Time

The improvements made that allow the compiler to optimize your LabVIEW code require that the compiler be “smarter,” which means it should understand when code is unnecessary or how to “rewrite” the code to run faster. As the compiler becomes smarter, more work is completed with each pass of the compile, thereby increasing the compile time itself. Although NI measured an average compile time increase of 5X, the common LabVIEW application still has the “instant compile” feeling. However, the increase in compile time isn’t linear, meaning larger diagrams see a bigger increase in their compile times.

Launch Time

The launch time metric for LabVIEW measures the time it takes from launching the LabVIEW.EXE to being able to use the program. This number can be affected by many different factors, such as the OS, computer architecture, background processes, and placement of LabVIEW in the memory stack. NI  tracks two different launch measurements.

Cold Launch Time

Cold launch time is the time it takes to launch LabVIEW when the disk cache is empty. For example, this time measures the launch time for LabVIEW upon a computer reboot. Cold launch time has improved by 16 percent with LabVIEW 2010.

Warm Launch Time

When LabVIEW is loaded in memory, the OS caches a lot of the information and memory needed to run LabVIEW. Therefore, “relaunching” LabVIEW is significantly faster than launching LabVIEW the first time. Warm launch time has increased by 9 percent with LabVIEW 2010.

Load Time

Load time is measured as the time it takes to load the entire VI hierarchy from memory. NI testing has shown that “cold” load time is heavily affected by the physical disk location of the files. The average load time variation due to changes in LabVIEW 2010 is negligible, but you may see an increase or decrease depending on your OS and disk differences, which LabVIEW cannot control.

Application Builder Build Time

Building an executable is one way to improve the overall performance of your LabVIEW application. You can use the LabVIEW Application Builder add-on to build executables, installers, .NET interop assemblies, and packed project libraries, among other build specifications. The time it takes to actually build a LabVIEW executable is a metric NI tracks with each release of LabVIEW. Building an executable requires compiling all the VIs it includes. Because compiling is slower due to a smarter compiler, building applications is also slower. However, compiling is only one portion of the application building process. As such, the overall build time increases by an average of 35 percent.

Mass Compile

The “mass compile” metric measures the actual mass compile operation under the LabVIEW Tools »Advanced»Mass Compile menu option. You can point the mass compile process at a directory of VIs, which LabVIEW loads, compiles, and saves. This process shows an average slowdown of 35 percent.

Real-Time Deploy Time

This metric measures the time it takes to deploy an application to a real-time hardware target. This involves not only compiling the VI for the specific processor architecture running on the hardware but also the transfer of files over the network to the target. Real-time deploy time in LabVIEW 2010 increases by 51 percent.

Size Metrics (on disk)

These metrics measure the static memory consumption on the hard drive. VIs in LabVIEW 2010 are on average 4 percent bigger on disk.

Back to Top

3. What Changes Can You Make? 

Of course, writing “better” LabVIEW code can always produce better run-time performance. Consider the following options, both new in LabVIEW 2010 and available in previous versions.

New Features in LabVIEW 2010

LabVIEW 2010 includes several features and enhancements, other than those in the compiler, that you can use to improve your application run times.

SubVI Inlining

By creating modular applications using subVIs, you can easily reuse code segments without having to rewrite code in several places. LabVIEW 2010 introduces the ability to inline subVIs into their calling VIs. This completely eliminates subVI call overhead and increases code optimization possibilities in your calling VIs. When you inline a subVI, LabVIEW inserts the subVI source code into the source code of the calling VI and then compiles all the code. Inlining subVIs is most useful for small subVIs, subVIs within a loop, VIs with unwired outputs, or subVIs you call only once. It does cause your calling VI to grow larger because it includes the subVI’s code, so this optimization features a direct trade-off between execution speed and VI size.

Learn more

Parallel For Loop

With the parallel For Loop, introduced in LabVIEW 2009, you can execute individual iterations of For Loops in parallel by distributing them among multiple processor cores. You specify the maximum number of cores that can be used, and LabVIEW distributes the iterations among the cores available. LabVIEW 2010 provides a revamped scheduling algorithm, which improves the run-time performance of the parallel For Loop, especially when the iterations perform varying amounts of work.

Learn more

Saving VIs without the Compiled Code

LabVIEW 2010 can improve developer workflow for those using source code control. This new VI Properties option separates the compiled objects from the actual source code that you write in LabVIEW.  With this new setting turned on, a VI does not show up as modified after being recompiled, which eliminates the need to resave and resubmit files to source code control unless the developer has changed the graphical source code.

Learn more

Packed Project Libraries

The new packed project libraries allow for a more modular software design and development process, shorter build times, and easier deployment.  These libraries are a new LabVIEW file type that places a project library and all of its referenced VIs into a single file.  The exported VIs contained within this file behave like other VIs saved without block diagrams, except their hierarchy is completely hidden.  They follow a build specification where an existing .lvlib file is selected to define which VIs are being built into the packed project library and which VIs are public/exported.

 

Polymorphic Primitives on Large Arrays

Most modern machine architectures support Streaming SIMD Extensions (SSE) instructions. LabVIEW 2010 uses SSE instructions for an explicit set of functions with an array input. This drastically improves the performance of these primitives because you can operate on multiple values with a single instruction. Table 1 shows which LabVIEW functions are supported by this improvement:

 

Double

Float

(u)int8

(u)int16

(u)int32

(u)int64

Add

Subtract

Multiply

Divide

-

-

-

-

Minimize

Maximize

Reciprocal

-

-

-

-

Increment

Decrement

Square

Square-root

-

-

-

-

In Range and Coerce

Sum

Product

To Double

 

-

To Float

 

-

To Int32, Int16, Int8, UInt32, UInt16, Uint8

-

-

-

-

Table 1. NI used SSE instructions to improve the performance of several LabVIEW primitives.

Classic Ways to Improve Performance

Although these new features and compiler optimizations in LabVIEW 2010 will increase the run-time performance of your applications, the classic ways of improving the performance of LabVIEW applications still apply. Here is a glimpse at two ways to improve your LabVIEW application performance.

VI Profiler

The VI profiler is perhaps the most useful performance optimizing tool in LabVIEW.  Available from the Tools menu at Tools»Profile»Performance and Memory…, the VI profiler offers a rough estimate of the average execution time of the VIs used in your application. Limited to millisecond resolution, the execution time measurement needs to be averaged over several iterations of the code. The VI Time measurement tells you the total amount of time spent executing each VI.  The 20 percent of the code that you want to optimize may already take the least time to execute, but, if it is called thousands of times, it’s still where you want to focus your efforts.

Disable Debugging 

One method to improve the performance of your VI is to disable debugging. When you are developing, you typically will want to keep this option activated, but once the application is finished, or you are done debugging, you should disable this Execution option from the VI Properties dialog.

Figure 10. VI Properties Dialog to Disable Debugging

More Information

Learn more about these and other techniques by browsing this presentation, which includes slides, examples, and instructions for performing optimizations on your LabVIEW code.√

NI Customer Education recently introduced the LabVIEW Performance course , which is designed to help you write better code and get the most out of your applications. This two-day classroom or three half-day online course covers topics such as managing threads, optimizing code for memory, optimizing code for execution speed, and more. View the course details.

Back to Top

Bookmark & Share


Ratings

Rate this document

Answered Your Question?
Yes No

Submit