If you are designing a large application, your algorithm VI might have a lot of subVIs, arrays, nested For Loops, Case structures, etc. Configuring directives for the algorithm VI is challenging because there are a lot of directives and options you can configure. Using the default settings might return bad performance or resource usage. You can use the following steps to improve the performance of the FPGA IP.
2. Examine and refine your algorithm VI
The FPGA IP Builder allows you to write general LabVIEW codes as the inputs to the tool. However, because the end goal is still targeting to the FPGA hardware, there are several general guidelines that you should keep in mind.
- Minimize the number of arrays and array buffers. It takes time and hardware resources to copy an array in FPGA.
- If you have fixed-point numbers, try to minimize their word length. A long word length operation consumes more hardware resources and returns a longer latency and a lower clock rate. You can use normalized variables in your algorithm VI to preserve enough accuracy while using short fixed-point numbers. Do not let the fixed-point word length grow out of control. Whenever possible, use the Truncate and Wrap modes for fixed-point operators.
- If you have mathematical formulas, use the simplest equivalent expression. For example, the Multiply functions require less hardware resources than the Divide functions. Changing the order of numerical operators may also help save hardware resources. For example, if you want to make a calculation of Matrix * Matrix * Vector, you can do it in the following order: Matrix * (Matrix * Vector).
- If you want to perform mathematical operations such as sine, cosine, or divide, you can find examples in the NI Example Finder, available by selecting Help>>Find Examples in LabVIEW and then navigating to Toolkits and Modules>>FPGA IP Builder>>Mathematics in the NI Example Finder.
3. Divide your design to small functional blocks
Normally, a large application can be described as a block flowchart. National Instruments recommends that you first design your algorithm VIs as a block flowchart so that it’s easy for you to maintain and analyze your algorithm VIs. Then you can figure out specific throughput/latency/resource requirements for each functional block, and optimize those functional blocks to meet the requirements.
You may need to adjust the requirements during the design process. For example, if you want to limit the total latency of the following diagram to a certain value, check the estimation report to identify which path takes more cycles to execute. For the slower path, the primary goal is to shorten the latency, and for the other path, the primary goal is to reduce the resource usage.
Figure 1: Diagram with two data paths
Use one of the following options to implement your design:
- Use the FPGA IP Builder to generate IP for each functional block and combine the generated VIs in LabVIEW FPGA. This option is easier to generate high-performance IP than the other option but requires knowledge of the LabVIEW FPGA handshake protocol. You might also need to write additional LabVIEW FPGA code to handle different IP interface types.
- Use the FPGA IP Builder to find directives for each functional block, and then combine all directives in the directives item of the top-level VI. You may need to enable the Inline off directive for each functional block to ensure that the FPGA IP Builder produces the same performance and resource usage result as in individual estimations.
No matter which option you choose, the golden rule is to figure out the best design for each functional block first.
4. Determine the interface type of your IP
You need to configure the interface of your IP properly for array inputs and outputs. If you choose the All elements interface, be aware that LabVIEW FPGA will use at least 1 additional copy of this array outside the IP block. The following example illustrates the importance of using a proper interface type. This example contains a tri-linear interpolation VI that computes 1 output from 8 24-bit input data. This computation repeats 10 times. The subVI can achieve an initiation interval of 1 cycle and a latency of 10 cycles. So the top-level VI can achieve an initiation interval of 10 cycles and a latency of 19 cycles. In this situation, if you choose the element-by-element interface, LabVIEW takes 80 cycles to read all input data, which requires a lot of time. On the other hand, if you choose the All elements interface, the FPGA IP Builder will use 80*24=1920 registers to store all the input data, which requires a lot of resources.
Figure 2: Compute tri-linear interpolation 10 times
Instead, you can group the eight input data to a cluster and choose the Element-by-element, unbuffered interface to save both time and resource.
Figure 3: Modified tri-linear interpolation
5. Optimize individual functional blocks
After dividing a large application into individual functional blocks, refer to the Creating and Configuring Directives topic of the FPGA IP Builder Help for information about optimizing each functional block. You can get started by optimizing small VIs first.
If the estimated performance/resource result is below your expectation, you can check the detailed performance/resource report to identify which subVI does not meet your expectation and needs further optimization. The FPGA IP Builder automatically inlines small subVIs to reduce the overhead of subVI calls, so the estimation report does not contain information of those subVIs. Therefore, you can first enable the Inline off directive for all subVIs to see the detailed report, and then disable this directive to finalize your design.
You can see the log of the estimation report to know why the requirements are not met. For example, if you specified a value for the Initiation interval directive of a subVI which contains a For Loop with a variable loop count, you will see “@W [SCHED-65] Unable to satisfy pipeline directive: Function contains subloop(s) not being unrolled” in the log. This information appears because LabVIEW failed to unroll the For Loop, which has a variable loop count.
Figure 4: Diagram with a variable loop count
The following list contains some tips specifically for the FPGA IP Builder Early Access Program to improve the performance/resource usage of a large application.
- Specify a minimum possible value for the Initiation interval directive of each subVI and the top-level VI. This approach can force the FPGA IP Builder to perform an aggressive optimization on subVIs.
- Instead of setting the Maximum latency directive, you can set the Initiation interval directive to achieve a short latency.
- If you use a subVI at multiple locations on the block diagram, specify a small value for the Initiation interval directive of this subVI and enable the Inline off directive. The FPGA IP Builder will reuse this subVI when possible.
- If you want to set some specific directives on some nodes, wrap these nodes in a subVI.
- If you have two parallel For Loops, the FPGA IP Builder schedules them sequentially. You can wrap them into subVIs to execute them in parallel.
- If you have multiple subVIs on the block diagram and you want them to execute in parallel, enable or disable Initiation interval directive for all of them.
6. Verify generated IP
After configuring the directives, generate IP for each functional block and compile them at the expected clock rate to verify timing before trying a compilation on the whole big design. This can help you figure out the critical path much faster. Also create separate testbenches for your IP to make sure the IP works as you expected.
7. Integrate the generated IP into your application
In most cases, you still need to write LabVIEW FPGA code for data communication and I/O access in your application. This part may also take a lot of FPGA resources if there are arrays or multiple clock domains. Make sure this part only takes a reasonable amount of FPGA resources before you integrate the generated IP into your application.
You must place the generated FPGA IP in a Single-Cycle Timed Loop. To achieve optimal runtime performance and hardware resource usage, National Instruments recommends that you place all I/O functions of this FPGA IP in the same Single-Cycle Timed Loop, as shown in the following figure:
Figure 5: Place all I/O functions in a Single-Cycle Timed Loop
However, in some cases you might have to place the I/O functions of the generated FPGA IP outside a Single-Cycle Timed Loop, as shown in the following figure:
Figure 6: Place I/O functions outside a Single-Cycle Timed Loop
In the previous figure, you wire the ready for input terminal of your IP to the input valid terminal, and wire the output valid terminal to the stop terminal of the Single-Cycle Timed Loop. In this way, you can use the Single-Cycle Timed Loop as a normal LabVIEW VI. If you want to use your IP in this way, do not use the element-by-element interface in your IP.