1. Introduction to LabVIEW FPGA IP Builder
As an add-on to the LabVIEW FPGA Module, LabVIEW FPGA IP Builder generates high-performance FPGA IP by combining high-level synthesis (HLS) technology with the power of LabVIEW. You can use LabVIEW FPGA IP Builder to:
- Automatically optimize your LabVIEW FPGA VIs
- Easily port LabVIEW Desktop code to the FPGA
- Quickly iterate with rapid performance and resource estimates
- Reuse your IP, unmodified, to adapt to different application requirements
This document covers the basic concepts behind LabVIEW FPGA IP Builder, including tool flow and optimization examples. To experience the benefits of LabVIEW FPGA IP Builder, download a fully functional evaluation copy and follow this tutorial. Hardware is not required.
LabVIEW FPGA IP Builder uses HLS technology to optimize algorithms written in LabVIEW. HLS generates efficient hardware designs from three components:
- An algorithm implemented in a high-level language such as LabVIEW
- Synthesis directives specifying the performance and resource objectives
- An FPGA target with known characteristics
Figure 1. Based on design directives for a specific target platform, HLS technology generates efficient HDL implementations of an algorithm implemented in a high-level language.
An HLS compiler’s output is an optimized implementation of the original algorithm, typically in a hardware description language (HDL) such as VHDL or Verilog. The compiler applies techniques to meet the objectives specified by the directives, based on knowledge of the hardware resources and other constraints. Users integrate the generated code back into the overall design, where it interacts with I/O and other components in the system.
You can take advantage of HLS technology within LabVIEW by using LabVIEW FPGA IP Builder. If you have existing algorithms written for LabVIEW for the desktop, or the LabVIEW Real-Time Module, you can port them to the FPGA without much modification or advanced optimization tricks.
If you are an advanced LabVIEW FPGA user, you can use this add-on to increase IP reuse by creating different directives that capture your objectives. Because these directives are stored separately from the code, you can easily explore design trade-offs and reuse your IP to meet multiple design constraints without having to modify or reverify it.
2. Using LabVIEW FPGA IP Builder
Tool Flow Overview
Creating optimized VIs with LabVIEW FPGA IP Builder is a straightforward and usually quick process, thanks to its rapid estimation capabilities. You start by creating or adapting an existing VI. Next, you specify directives and iterate on them until you are satisfied with the performance and resource utilization. Then proceed to generate your design, which automatically gives you VIs that call the generated VHDL. Finally, integrate those VIs back into your LabVIEW FPGA application.
Figure 2. Typical LabVIEW FPGA IP Builder Design Flow
With LabVIEW FPGA IP Builder, you can use higher level code constructs such as large arrays, nested loops, and floating-point data in your code. These constructs are typically hard to use in an FPGA environment, but LabVIEW FPGA IP Builder automatically transforms them into more efficient implementations that retain the same functional behavior.
After you install LabVIEW FPGA IP Builder, it is listed as a new item under your LabVIEW FPGA Target in the LabVIEW project. You can start with an existing implementation of the desired IP and add a copy of the VI under the IP Builder item in the project. You can also create VIs under the IP Builder project item using regular LabVIEW syntax.
Figure 3. Sample View of LabVIEW Project With IP Builder Item as Parent of VI and Directives Items
The following example uses a simple FIR filter to demonstrate the adaptation and optimization process with LabVIEW FPGA IP Builder. The example is included with LabVIEW FPGA IP Builder, so you can download it and follow along, even if you don’t have NI FPGA hardware at hand. The original FIR filter, depicted below, uses floating point data, arrays, and basic arithmetic inside of a For Loop.
Figure 4. Original FIR Filter Algorithm Implementation
If you were to optimize this filter in LabVIEW FPGA to achieve higher throughput, you would have to manually unroll the loop, add pipelining registers, and apply other optimization techniques to achieve better performance, ending up with a very different diagram, as shown below.
Figure 5. Equivalent FIR Filter Implementation Optimized for FPGA Use Inside a Single-Cycle Timed Loop
With LabVIEW FPGA IP Builder, you can achieve comparable performance with few modifications to the original code, as shown below, and separately specify your optimization objectives through design directives, keeping your code readable and easy to maintain.
Figure 6. Equivalent Implementation of the FIR Filter to Be Consumed by LabVIEW FPGA IP Builder
The shipping example uses fixed-point data and that version is used for this tutorial. LabVIEW FPGA IP Builder supports single-precision floating point types starting with its 2013 release. You can still use the floating-point version of the filter in LabVIEW FPGA IP Builder, but it will be at the expense of slightly lower performance and higher resource utilization, which is discussed later in this document.
Even though you can write your code more naturally with LabVIEW FPGA IP Builder, you may still have to adapt your VIs before you can put them through the tool, mostly due to constraints inherent to FPGA technology and limited function support. Consider the following when developing IP with LabVIEW FPGA IP Builder:
- Limited function support: LabVIEW FPGA IP Builder currently supports a limited set of functions that cover basic logical and arithmetic operations as well as basic flow control constructs. If a required function is missing from the palette, you often can build it from these more fundamental pieces. The shipping examples show how to build common operations such as sine/cosine, arctangent, RMS, and others. Function support increases with each new version of the product. The product documentation contains an exact list of supported functions.
- Floating-point and other data-type support: LabVIEW FPGA IP Builder supports single-precision floating-point types, however, excessive use of such type may increase resource utilization and lower the performance of the generated IP. It is recommended that you minimize floating-point use and reserve it for parts of your code where the dynamic range of the type is required, or for those where mapping to fixed-point representation is difficult. For other parts of your code, try to use integers whenever possible, or fixed-point types. Be careful when performing this conversion because large integer and fixed-point word lengths can also have a negative effect on performance and resource utilization, or can cause numerical errors due to saturation and loss of precision. The product documentation contains an exact list of which data types are supported.
- Fixed-size arrays: The limited nature of FPGA resources means that data structures need to be bounded in size. You must convert all arrays to be of fixed-size before you can use them with the tool. The tool is capable of converting arrays at the interface level, the connector pane of the top-level VI, into streaming, point-by-point interfaces, lowering resource utilization and matching how signal- and data-processing chains typically work. See the Interface Options section of the product documentation for more information on this topic.
- Limited function support for arrays: Not all arithmetic or logical functions currently support arrays as their inputs. You can easily wrap such operations in a For Loop and process each element at a time. The tool recognizes and optimizes such patterns so there is no effective loss of performance.
- 2D and higher dimension arrays: LabVIEW FPGA IP Builder does not currently support n-dimensional arrays, for n>1. You can typically work around this restriction by storing data in a one-dimensional array and keeping track of the original dimension sizes as you access the array.
- Minimize array and cluster copies: Similarly to LabVIEW for the desktop, array/cluster branches and type coercion should be minimized. The limited nature of FPGA resources means that large data structure propagation through the diagram needs to be carefully considered to avoid additional buffers and copies. The product documentation contains helpful tips on how to manage resource utilization and promote efficient access to array data structures. You can execute the adapted VI under the LabVIEW FPGA IP Builder context. This execution is a type of simulation where you can write test benches and verify code behavior. You can also use the VIs under the My Computer context and write complex test benches that use the full analysis and display capabilities of LabVIEW. It is recommended that you test your adapted VI to verify its functionality. These tests are also useful to verify the optimized IP later on.
Once you are satisfied with the functional behavior of your code, you can proceed to create directives to achieve your performance and resource utilization goals.
You can create one or more sets of directives for a given VI under the IP Builder item in the LabVIEW project. The VIs under the IP Builder item, the associated directives, and the FPGA target, provide LabVIEW FPGA IP Builder with all the information it needs to analyze and compile your code to meet your design goals.
The direct result of this separation between code and directives is that you do not have to change your code to optimize it as long as LabVIEW FPGA IP Builder can do it for you. Another benefit of this separation is the fact that you can then reuse your existing IP for multiple projects through the application of different design directives.
Figure 7. With LabVIEW FPGA IP Builder, you can reuse your IP to meet multiple design challenges through the use of different directives.
Directives are created by right-clicking on a VI, and are stored as part of your project. Directives are defined through a configuration dialog, shown below, which you can access by double-clicking on the directive item. The LabVIEW FPGA IP Builder Directives dialog is where you define your optimization goals and iterate using the estimates provided by the tool. With LabVIEW FPGA IP Builder, you can obtain these estimates much faster than the time it would take to compile an FPGA VI, so you can iterate much quicker on your design. See the results section for a demonstration of this benefit.
Figure 8. With the Directives dialog, you can associate design constraints with parts of your diagram.
The Directives tab of the configuration dialog contains a screenshot and a tree representation of the diagram. If you select specific items from the tree view, the list of directives is updated to those that apply to the selected element. You can toggle individual directives and assign values as desired. The tree’s top-level item is selected by default, so the directives shown apply to the overall diagram.
The process of setting directives and estimating results is typically iterative. Start by specifying high-level directives, estimating, and then adding other directives or making small diagram changes to achieve higher performance. At any point, you may access context help for each directive by clicking the “?” button next to it. The LabVIEW FPGA IP Builder product documentation provides a suggested flowchart for configuring directives.
It provides a tree representation of diagram elements and their directives. Here you can start an estimation process, which usually takes anywhere from few seconds to a few minutes, depending on the size of the diagram and the difficulty imposed by the design directives. Once the process is finished, you are presented with performance and resource utilization reports on the same window.
Figure 9. LabVIEW FPGA IP Builder provides performance and resource estimates in a fraction of the time it takes to compile a design.
LabVIEW 2013 FPGA IP Builder features an in-product optimization advisor that provides feedback as you are iterating on your directives and design. The Design Feedback report is available after you go through an estimation process, and is accessible from the Reports pull-down menu.
Figure 10. The Design Feedback report provides tips on how to properly apply directives and modify your code to achieve better performance.
The following section is an example of how you can use directives to achieve your performance goals.
3. Optimizing VIs Through Directives in LabVIEW FPGA IP Builder
Throughput, latency, and resource utilization are the most common design factors in high-performance LabVIEW FPGA applications, and there is typically a trade-off between them. For example, often, you can increase throughput at the expense of resources and latency. You can also reduce latency by parallelizing your code, when the algorithm permits it, leading to increased resource utilization. These trade-offs give the optimization process its iterative nature.
Figure 11. Balancing resources, throughput, and latency makes for an iterative optimization process.
Throughput Optimization with LabVIEW FPGA IP Builder
Use the FIR filter example included with LabVIEW FPGA IP Builder to learn how to increase throughput with directives. Out of the box, the default directive for the VI uses a clock rate of 40 MHz and an unspecified initiation interval. Figure 12 shows the initial estimate results.
Figure 12. Default Directive Settings for FIR Filter Example
Throughput is the result of three factors: the speed of the clock you are using to drive your design, how many cycles are required before your algorithm can accept new input, and how many samples the IP accepts per call. The number of cycles required before the algorithm can accept new inputs is also referred to as the initiation interval (II). In this context, we define throughput as the following:
Samples per call is the number of data samples processed by each call to your VI, and is a fixed characteristic of your design. You can, for example, implement a parallel version of an algorithm that processes multiple samples per call, increasing throughput while keeping the clock rate and initiation interval constant.
The initial estimates indicate that the tool can achieve the requested clock rate at an initiation interval of 62 cycles, yielding a throughput of 645,161 samples per second at the requested clock rate.
This throughput may be enough for some applications. High-performance applications, however, require signal-processing throughput in the millions or trillions of samples per second, and those rates are possible with modern FPGAs.
The simplest way to increase throughput is usually to increase the clock rate. You can then specify a much higher clock rate, 200 MHz, through the value of the clock rate directive. Figure 13 shows the directive settings and the results.
Figure 13. Estimates indicate that the design can surpass the requested clock rate with an increased initiation interval and slightly higher resource utilization.
The estimates indicate that the tool can generate a version of the IP that can compile at the requested clock rate, or even higher. Resource utilization barely changes but there is a 15-cycle increase in the initiation interval. The throughput at the requested rate is now:
This increase in clock rate quickly leads to a significant increase in performance. Increasing the clock rate works only to a certain extent as it also makes the compilation process harder, eventually making it impossible to generate a circuit that meets the requested timing constraints on the target platform. Other times, you may be restricted by other parts of your design that can’t compile at those higher rates or may require your IP to execute with a certain clock relationship with respect to the rest of the design.
Another effective way to increase IP throughput is to lower its initiation interval. Lowering the initiation interval equates to creating IP that can accept samples more often, that is, with fewer cycles in between calls. A common way to lower the initiation interval of a piece of code is to create a pipelined design.
The initiation interval of a piece of IP is different from its latency. A pipelined design accepts data after fewer cycles than the amount it takes for a given sample to go from input to output, which is its latency.
As an example, consider an oil pipeline from the real world. A pipeline with a flow rate (throughput) of one barrel/s means that a pump feeds one barrel every second into the pipeline. The overall latency of the pipeline is the time it takes for a given barrel of oil to travel from one end of the pipeline to the other (assuming oil actually traveled as a unit), and it is much more than one second for most real pipelines.
A manufacturing assembly line is another example of a pipelined process, where multiple stations assemble different parts of a product, working on many different orders at any given time. Adding stages to a pipelined process means splitting it into simpler stages, which can complete faster, increasing overall throughput at the cost of more workers and assembly hardware.
With LabVIEW FPGA IP Builder you can request a lower initiation interval for the design and let the tool apply optimization techniques to try to meet it. When you set a top-level initiation interval, LabVIEW FPGA IP Builder will do two main things for you:
- Unroll all loops within your design
- Try to pipeline and/or parallelize the design
Loop unrolling consists of taking a loop structure and decomposing it into copies of its content based on the number of iterations the loop is expected to execute. Those copies can be executed in parallel if there are no data dependencies between successive iterations of the loop. The following pseudo-diagram shows an unrolled and parallelized VI.
Figure 14. Unrolling a loop can lead to a parallel implementation of its contents if there are no inter-iteration data dependencies, yielding higher throughput at the expense of more FPGA resources.
Parallelized sections of code directly increase their throughput by operating simultaneously on different parts of the data, at the expense of higher resource utilization.
If there are data dependencies between successive iterations of the loop, then unrolling the loop simply serializes the code, as shown below.
Figure 15. Unrolling a loop with inter-iteration data dependencies leads to a serialized version of the loop.
Pipelining helps serial code execute faster by breaking up the processing chain into concurrent stages. In the pseudo-code below, A, B, and C can now execute concurrently when placed in a loop, each working on their own samples. The feed-forward nodes help break up the execution into concurrent pieces, a technique commonly used to optimize single-cycle Timed-Loop code.
Figure 16. Adding feed-forward nodes helps break up serial code into concurrent pieces, which facilitates higher clock rates.
Applying pipelining to the unrolled version of the loop would yield the following.
Figure 17. Adding pipelining to the serialized version of the loop increases its throughput by making its iterations work on different samples concurrently.
These techniques allow the IP to accept new samples more often, effectively increasing its throughput. LabVIEW FPGA IP Builder applies these techniques for you when you specify the initiation interval for the top-level VI or a specific structure. You can also guide the application of these techniques with more specific directives if necessary.
Note that LabVIEW FPGA IP Builder does not generate LabVIEW code as part of its intermediate output, so these diagrams are only for explanatory purposes.
Now that you know what LabVIEW FPGA IP Builder does when you set a specific initiation interval, you can request an initiation interval of one to achieve the best possible throughput at the specified clock rate. When you do so, you should see something similar to Figure 18.
Figure 18. Forcing the initiation interval yields higher throughput at the cost of increased resource utilization.
The estimates indicate that the tool can meet the requested clock rate and initiation interval at the cost of higher resource utilization. LabVIEW FPGA IP Builder achieves this by applying pipelining and parallelization techniques to meet more aggressive timing constraints. As explained above, those techniques typically use more FPGA resources.
The throughput at the requested clock rate is:
This represents an increase of 61 times the throughput obtained in the previous step and it only took enabling one directive item to make it possible.
Figure 19 summarizes the requested and achieved performance and resource utilization for the three configurations. Notice how throughput is substantially increased by specifying a higher clock rate and smaller initiation interval. For the third configuration, the tool estimates that the generated filter can achieve a throughput of 200 million samples per second.
Figure 19. Resource and performance results for the three directive configurations. The first set represents the baseline results, the second shows the effect of increasing the clock rate, and the third set shows the effect of setting the initiation interval to 1
The estimation process took 30 seconds or less on our test machine, so the whole iterative estimation process took only a few minutes and resulted in a throughput increase of 310 times without code modifications.
None of these three configurations is inherently better than the other two. Using IP that exceeds the performance requirements of your application usually means you are paying a higher resource cost, so the “right” directives for a given application are not necessarily the ones that yield the highest throughput but rather the ones that meet your design requirements in the most efficient manner. Therefore, you can imagine using the first set of directives for applications where a throughput of 600 kS/s is adequate instead of using the best performing IP at a higher resource cost.
Without LabVIEW FPGA IP Builder, you have to develop and maintain multiple implementations of the same algorithm to serve different requirements, leading to poor reuse and increased development time. With LabVIEW FPGA IP Builder, you can easily explore performance and resource utilization trade-offs for a given piece of code while keeping it unchanged.
Verifying IP Performance and Resource Utilization
It is important to remember that these numbers are estimates and their accuracy can vary. The ultimate verification of performance is always trying to compile the generated IP using LabVIEW FPGA at the desired rate. This can be a tedious process because you must generate the IP and then insert it into your design or some small LabVIEW FPGA VI to compile the IP and check its actual rate and resource utilization. LabVIEW 2013 FPGA IP Builder introduces a new, “thorough”, estimation mechanism which compiles your IP using the Xilinx toolchain for the desired FPGA target, providing more accurate resource estimates.
Figure 20. LabVIEW 2013 FPGA IP Builder offers a thorough estimation option that compiles your IP in isolation for faster and more accurate results.
This process is much faster than having to create the compilation harness. The compilation is also faster because it only compiles IP itself, and not the harness or other VHDL that is added by LabVIEW FPGA outside of what you see on the diagram. As point of reference, the thorough estimate for the FIR example compiles took two minutes and 26 seconds. Compiling the generated IP as part of a larger harness takes 10 minutes on the same machine.
The following table shows the difference in estimated resource utilization and achievable rate between using a fixed-point and floating-point implementation of the FIR filter.
|Thorough Estimate Category||Fixed-point implementation at 350 MHz||Floating-point implementation at 350 MHz|
|Clock rate (MHz)||357.65||266.03|
Table 1. There can be a significant difference in performance and resource utilization between using fixed-point and floating-point types.
Floating-point types offer a convenient way to implement algorithms that must deal with fractional data and require a large dynamic range. As you can see, there can be a significant difference in performance and resource utilization when compared to a fixed-point implementation. You should therefore only use floating-point types whenever the algorithm requires the dynamic range or mapping it to fixed-point would be difficult.
FPGA Resource Optimization with LabVIEW FPGA IP Builder
As a final example, this section briefly shows how you can achieve better performance and still minimize resource utilization using directives. The following diagram is a simple matrix-vector multiplication that takes an 8 x 8 matrix and an eight-element vector. You can download this example at the LabVIEW FPGA IP Builder Community.
Figure 21. This is a simple matrix-vector multiplication algorithm using LabVIEW FPGA IP Builder. Notice the use of higher level constructs such as nested loops, arrays, and subVI calls.
The following graph summarizes resource and performance results as different directives are applied to this VI. The first set of points represents the baseline performance, with a throughput of 15 MS/s and a latency of 358 cycles, before applying any directives to the design.
Figure 22. LabVIEW FPGA IP Builder can increase the throughput and lower the latency of the matrix-vector multiplication example, while keeping resource utilization low.
The second set of points in the progression shows how performance and latency improve as soon as the clock rate and initiation interval are specified. This increase in performance is accompanied by a substantial increase in resource utilization.
The third and fourth set of points in the progression show how certain directives can help improve resource utilization while increasing or maintaining performance.
You can obtain the third set of points by changing the array interfaces at the top level to unbuffered. You can do this for array inputs and outputs as long as the elements in those arrays are accessed only once, sequentially, for every call of the VI. In the case of the matrix-vector multiply example, the matrix input and the result output array fit these criteria so the directives help LabVIEW FPGA IP Builder remove unnecessary buffers and related circuitry, making the design more efficient and with a lower initiation interval, yielding better throughput.
The fourth set of points show a decrease in the number of multipliers (DSP48 elements) required by the design. Multipliers are often a scarce resource on FPGAs so LabVIEW FPGA IP Builder offers a directive exclusively intended to optimize their use. The Share Multipliers directive instructs the tool to generate implementations in which multipliers are reused by different parts of the design. LabVIEW FPGA IP Builder will multiplex multiple streams of data to share the same multiplier resource, bringing the number of multipliers down from eight to two in this case.
Refer to the product documentation for more tips on improving FPGA resource utilization.
4. Integrating the output of LabVIEW FPGA IP Builder
Once you are satisfied with the estimates for your VI, you can generate the design by creating a new build specification. You can create one or more build specifications per directive item in your project.
LabVIEW FPGA IP Builder build specifications are similar to build specifications found elsewhere in LabVIEW. You can specify the desired directive set as well as the name and location of your output file. You also can add comments to the build specification for documentation purposes.
After you have completed the build specification, LabVIEW FPGA IP Builder generates an optimized HDL implementation of the algorithm, configures an IP Integration Node to call it, and wraps that node in a new VI with the specified name. The generated VIs are automatically added to your project hierarchy so you can place the them in a single-cycle Timed Loop, along with the rest of your application.
Figure 23. Building your IP combines the VI and optimization directives, generating optimized HDL that is automatically used in an IP Integration Node and placed inside a VI. You can then simply add the VI to your LabVIEW FPGA design.
The generated IP typically takes more than one cycle to execute and may also need to handshake with downstream blocks. This is similar to the four-wire handshaking protocol already present in many LabVIEW FPGA IP blocks. LabVIEW FPGA IP Builder automatically adds the necessary handshaking signals to your optimized VI and you must wire these signals so that you can inform upstream blocks when your IP is ready to consume data, signal downstream blocks when your IP has generated valid data, and know when downstream nodes are ready to accept it.
Figure 24. The generated IP is wrapped in an IP Integration Node and placed inside a VI that you can use within a single-cycle Timed Loop in your application.
This represents a good point for further verification of the generated VI. LabVIEW FPGA IP Builder generates the necessary VHDL simulation support files so you can use simulation mode to run the FPGA IP and verify its functionality, initiation interval, and latency. Refer to the Validating Generated FPGA IP topic of the product documentation for more information on how to accomplish these tasks.
Once you have integrated the IP into your design, you can proceed to compile it. If the compilation step succeeds, the generated IP can function at the required rates. At this point, you can validate that the application works by running the necessary system- or component-level tests.
5. Hardware Support
LabVIEW FPGA IP Builder supports the latest and most popular set of NI FPGA hardware targets. Refer to the product release notes for a detailed list of supported hardware.
6. Try LabVIEW FPGA IP Builder Today
You can download a fully functional copy of LabVIEW FPGA IP Builder and evaluate it for a limited time. The tool includes several examples you can use, regardless of whether you have hardware available to deploy the generated IP.
Check out the LabVIEW FPGA IP Builder Community for additional multimedia training materials as well as additional examples and IP that you can use in your designs.