Parallel architectures, such as FPGAs and GPUs, have become widely used to accelerate the analysis of large data sets. Both technologies allow compute-intensive portions of algorithms to be offloaded from the CPU for processing on these highly parallel architectures. FPGAs offer a high degree of flexibility and low-latency processing, however they face certain limitations in terms of floating-point computations due to space restrictions. GPUs have gained popularity as a flexible, accessible, and low cost option for parallel processing. They can be used successfully in conjunction with FPGAs to optimize an algorithm’s execution speed. As an example, inline computations for an algorithm could be rapidly performed on an FPGA while a GPU analyzes floating-point data. Creating algorithms for GPU computing is facilitated by the NVIDIA® Compute Unified Device Architecture, CUDA™, which allows users to create code using the C programming language with NVIDIA extensions.
Figure 1: FPGAs and GPUs can be used together in conjunction with a CPU to optimize performance
In the field of Real-Time High Performance computing, there are many applications with data and task requirements that map well to processing on a GPU. Algorithms with a high degree of arithmetic intensity are well suited to processing on GPUs; a high ratio of arithmetic operations to memory operations indicates that a significant speed-up could be achieved when computed on a GPU architecture. As an example, applications handling multi-channel operations, such as computing several Fast Fourier Transforms (FFTs) in parallel, or mathematical operations, such as large matrix operations, map efficiently to GPUs.
The LabVIEW GPU Analysis Toolkit enables developers to harness the power of a GPU’s parallel architecture within the framework of a LabVIEW application. The toolkit leverages the functionality of the NVIDIA’s CUDA toolkit, as well as CUBLAS and CUFFT libraries, while allowing developers to utilize code already written for a GPU by using the LVGPU SDK.
The LabVIEW GPU Analysis Toolkit
The LabVIEW GPU Analysis Toolkit allows developers to take advantage of key NVIDIA CUDA resources, as well as CUBLAS and CUFFT methods, by leveraging wrapped functions from these libraries within LabVIEW. For advanced operations, code that has already been developed in CUDA can be offloaded to a GPU through the LabVIEW GPU Analysis Toolkit. Note that the toolkit will not compile LabVIEW code for use on a GPU, but rather it enables wrapped CUDA functions or user-defined CUDA kernels to be used in LabVIEW. By handling both CUDA kernel operations and their parameters using a unique device execution environment, known as a CUDA context, advanced user-defined kernels can be safely called during LabVIEW execution. This context also ensures that all GPU resource and function requests are properly managed.
The introduction of the LabVIEW GPU Analysis Toolkit enables scientists and engineers to perform large-scale data acquisition, offload this data in blocks to a GPU for fast processing, and view the processed data within a single LabVIEW application. Common signal processing techniques and mathematical operations, such as computing the Fast Fourier Transform of a signal, are available through convenient VIs that call directly into the equivalent NVIDIA libraries. This allows developers to quickly prototype methods using all available computing resources. Complex applications that have been developed in CUDA can be brought into LabVIEW to rapidly process data with user-defined algorithms.
Offloading an FFT Operation for Rapid Processing on a GPU
In general, communication with a GPU using the LabVIEW GPU Analysis Toolkit can be broken down into three main stages: GPU initialization, performing computations on the GPU, and releasing GPU resources. The following section discusses offloading an FFT operation from the CPU to a GPU using the LabVIEW GPU Analysis Toolkit.
Figure 2: Program flow when offloading an FFT from the CPU to a GPU for analysis
This example computes the FFT of simulated signals from multiple simulated channels, and is intended to imitate the acquisition of multi-channel input data from a DAQ device or log file. The workflow used is typical of offloading FFT and linear algebra operations to a GPU in a LabVIEW application.
I. Initializing GPU Resources
Figure 3: Initializing GPU resources
First, it is necessary to select a GPU device. In turn, this creates the CUDA context used to facilitate communication between LabVIEW and the GPU. This is accomplished by using the Initialize Device VI. Next, the GPU is prepared for FFT operations by selecting the type of FFT to compute. The type includes information on the FFT size, the number of FFTs to perform in parallel on the GPU, and the data type of the input signals or spectrums. This process reserves resources on the GPU to maximize performance. The Allocate Memory VI is used to create a memory buffer on the GPU to facilitate data transfer between the CPU and the GPU, for an in-place FFT operation. It stores both the channel data for downloading onto the GPU, as well as the results of the computations performed on the GPU for uploading to the CPU once processing has completed.
II. Performing FFT Computations on the GPU
Figure 4: Performing FFT computations on the GPU
Computation is performed first by transferring data from the CPU to the buffer on the GPU. The FFT computation is performed on the GPU, leveraging its massively parallel architecture to compute the FFT of each channel simultaneously. When computation has completed, the data is transferred from the buffer on the GPU back to an array on the CPU. The Download Data VI transfers the channels of data, stored in a LabVIEW array, from the CPU to the buffer allocated during the initialization phase. The FFT VI computes the spectrum in parallel for each channel downloaded. Finally, the Upload Data VI transfers the spectral data stored in the buffer on the GPU back to a LabVIEW array for use by the CPU.
III. Releasing GPU Resources
Figure 5: Releasing GPU Resources
The purpose of this final section is to release GPU resources that were initialized in the first step. The Free Memory VI releases the buffer on the GPU that was used to store FFT data. The Release Library VI frees resources on the GPU that were reserved during the initialization for FFT computations. Finally, the Release Device VI frees any resources on the GPU that were reserved during the initialization of the CUDA context to establish communications with the GPU.
Using the LabVIEW GPU Analysis Toolkit, developers have the ability to offload significant calculations to a GPU for processing, freeing up the CPU to work on other tasks. This affords a LabVIEW user a very powerful processing resource that was not previously available. Acquired data can now be rapidly processed using not only FPGAs and CPUs, but also GPUs, and viewed from a single LabVIEW application. As a result, users are able to utilize available system resources more effectively, while minimizing the computational costs associated with highly parallel data processing operations and transformations.
NVIDIA and CUDA are trademarks or registered trademarks of NVIDIA Corporation. Other company and product names may be trademarks of the respective companies with which they are associated.