### 1. High-Performance, Multithreaded Analysis

As stated before, the NI LabVIEW Multicore Analysis and Sparse Matrix Toolkit offers a range of high-performance, multithreaded analysis libraries that can be used to process large data sets. These libraries are designed for Windows OS based on the Intel® Math Kernel Library (Intel® MKL) which is a computing math library of highly optimized, extensively threaded math routines for applications that require maximum performance. Additionally, these libraries are extended to LabVIEW Real-Time (ETS) targets when you use this toolkit with the LabVIEW Real-Time Module. Therefore, users have access to multithreaded linear algebra, Basic Linear Algebra Subprograms (BLAS), and Fast Fourier Transform (FFT) based functions.

**Figure 1.** The Multicore Analysis and Sparse Matrix palette encloses high-performance, multithreaded analysis and sparse linear algebra functions

This toolkit provides a variety of libraries that take better advantage of multicore CPUs on a per-function basis. A set of thread management function is included for users to fine-tune the threading behavior of their applications

**Figure 2.** The Thread Management palette includes functions to effectively manage threads

### 2. Managing Threads

This section presents a method to manage threads using the product of two matrices as an example. The first thing to consider when managing threads is the number of cores available in the system where the function will be executed in. To get the CPU characteristics of the target system, the *CPU Information* function can be used. Furthermore, the *Get Threads* function from the Thread Management palette can be added immediately after to obtain the maximum number of available threads. This function delivers useful information concerning the maximum number of available threads related to linear algebra, transform, or all other functions. This categorization defines the function domain for each thread or group of threads, and allows for further managing capabilities depending on the application.

**Figure 3.** Getting the processor characteristics gives an idea of the maximum number of available threads

Users must mind that the actual number of threads that LabVIEW uses depends on the problem size, system resources, and other considerations. By default the Multicore Analysis and Spare Matrix VIs use the number of physical cores as the maximum number of threads unless you specify a smaller number. With this in mind, an arbitrary number of threads can be set using the *Set Threads* function; this must be done before executing the actual code to be parallelized. That is, the program flow must be performed through a sequence to have a better control over the number of threads being generated for a given piece of code. It is not recommended to execute functions from the Multicore Analysis and Sparse Matrix library in parallel with each other. Thus, close attention is required when managing threads to avoid problems such as oversubscription which happens when the number of threads trying to run exceeds the number of available logical cores.

**Figure 4.** Thread management must be performed in sequence to avoid parallel performance problems

This programming architecture can continue growing to include threads in other function domains or to reassign the number of threads for a given piece of code. The graph below shows a performance benchmarking for the matrix product problem featuring different number of threads on a computer with Intel i7 2600 quad-core CPU.

**Figure 5.** Computation time and performance improvement benchmarks for the matrix product operation on a computer with Intel i7 2600 quad-core CPU

Furthermore, threads can be defined for different function domains during the sequence. For instance, in a quad-core system two threads can be assigned to transform functions and another thread to all other functions except transform functions.

**Figure 6.** Threads can be defined for different function domains

### 3. Sparse Matrix Functions

A sparse matrix is a matrix populated primarily with zeros widely used in numerical analysis calculations. In contrast, if the majority of elements differ from zero then it is common to refer to the matrix as a dense matrix. Sparse matrices offer a much more efficient way to store data, and special computation techniques can be used in analysis routines to complete operations in less time. For instance, the graph below shows a comparison between using dense matrices and sparse matrices for an AxB function, a density of 0.01 were used for the sparse matrices.

**Figure 7.** Sparse matrices leverage substantial memory requirements reductions depending on the number and distribution of non-zero entries

The Multicore Analysis and Sparse Matrix Toolkit offers a wide range of Matrix VIs to manipulate the elements, diagonals, and submatrices of a sparse matrix. Users can use these functions to solve challenging problems involving matrices that were previously too large to store or process efficiently.

**Figure 8.** The Matrix pallet encloses a series of VIs to manipulate sparse matrices

The Matrix VIs use an Sparse Matrix object to define this kind of matrices. The toolkit includes comprehensive functions to perform conversions to and from sparse matrices.

**Figure 9.** Sparse matrices are represented through an object in LabVIEW

### 4. Data Type Support

Functions within the Multicore Analysis and Sparse Matrix Toolkit support both single-precision in addition to double-precision floating point data types. Thus, operations requiring less precision can be computed using less memory and in less time. Refer to the table below for further information about functions compatibility.

** **

**Table 1.** Data type support for the Multicore Analysis and Sparse Matrix Toolkit

Use single-precision data type instances when performance improvement and memory saving are important and operations will not overflow the range of the single-precision floating numbers. The following graph shows the performance benchmarking of the matrix product function and the FFT function using single- and double-precision data types.

**Figure 8.** Single precision computations could achieve 2x speedup compared to double precision computations for AxB and FFT functions

### Performance Considerations

Even though NI LabVIEW is already efficient managing threads when executed in a multicore target, many other applications can result benefited from the multithreaded functions presented in this document. While application-level speedup is highly dependent on the code, initial benchmarks and customer feedback show that individual functions can show as much as a **4-7x speedup**. These improvements widely increase when working with large data sets.

This section presents the performance benchmarking results for the Multicore Analysis and Sparse Matrix Toolkit for several matrix and FFT operations using different data types. These analyses were performed against the LabVIEW Advanced Analysis Library VIs on two different targets: firstly, on a quad-core i7 @ 3.4GHz, 8GB RAM Windows 7 64bit CPU; lastly, on a Core 2 Duo T7400 @ 2.16GHz, 512MB RAM LabVIEW Real-Time 2012 SMP NI PXI-8106 controller.

**Table 2.** Performance benchmarking results for Multicore Analysis and Sparse Matrix Toolkit VIs using 4 threads vs. LabVIEW Advanced Analysis Library VIs on a quad-core i7 64 bit CPU

**Table 3.** Performance benchmarking results for Multicore Analysis and Sparse Matrix Toolkit VIs using 4 threads vs. LabVIEW Advanced Analysis Library VIs on a Core 2 Duo RT NI PXI-8106 controller

### 5. Conclusions

The NI LabVIEW Multicore Analysis and Sparse Matrix Toolkit provides high-performance, multithreaded analysis libraries for use in LabVIEW. It is suitable for applications needing to process large data sets and its execution can be performed on Windows and LabVIEW Real-Time (ETS). This toolkit provides multicore programmers with a wider set of tools to efficiently manage threads executing in multiple cores and in different function domains. Moreover, a full set of multithreaded ready functions is included to manipulate sparse matrices. Although these analysis functions can take better advantage of multithreading, users must pay close attention to the adverse effects associated with creating too many threads.