Pipelining Memory Requests on FPGA
The FPGA is capable of issuing requests to memory very quickly, but it can take some time for those requests to complete. For maximum throughput, keep issuing memory access requests while your current request is in flight, instead of waiting to issue another request until the current request/retrieve pair completes.
Whenever you are optimizing LabVIEW's performance, it's important to avoid memory copies to avoid the overhead of unnecessarily copying data. With EDVRs, it's not only inefficient, but you may accidentally introduce a situation where you are no longer working with the actual data: you will be working on a copy of the data. To avoid copies, avoid branching the data wire and use shift registers instead of loop tunnels:
One of the things that a driver will usually do for you is abstract away hardware peculiarities on your system. HMB doesn't have driver calls inside the In Place Element Structure, so as a result, you may be exposed to some confusing behavior. One confusing behavior you may encounter is that the processor on Zynq has a memory consistency model that allows the processor to speculatively execute case statements and reorder memory accesses. In many cases, this won't affect your application, but sometimes your algorithm will have specific ordering requirements.
In the above example, the program clearly indicates that it wants address 0 to be read before it reads address 1. However, the processor is legally allowed to read them in the reversed order. This could create a very confusing intermittent data error.
A solution to this problem is to execute a memory barrier. A memory barrier is an instruction to the processor that all memory accesses prior to the memory barrier must execute before memory accesses afterward. They can be used to dictate strict ordering requirements to your memory accesses.
Memory Barrier.vi whenever you have strict ordering requirements on your HMB memory accesses:
The DMA interface is a shared resource between HMB, FIFOs, and the Scan Engine. To minimize performance impact on other DMA-based processes in your system, avoid unnecessary polling across the system bus. This means if one part of your application will poll something, that thing should be on the same side of the system bus in order to reduce bus utilization—for example, it's better for the FPGA to poll Controls and Indicators because they are implemented using registers.
All onboard DRAM interfaces specify a "Maximum Data Width". This dictates the element stride in memory, which limits the data types that are valid to use as the element data type. Regardless of the width of the data type that you choose for onboard DRAM or Host Memory Buffer, this will not change the element stride.
The Maximum Data Width for Host Memory Buffer is 64 bits, which means that the most efficient use of memory will be to always use 64 bit data types for HMB.
Out of Range Accesses
Reading or writing to addresses beyond the size of your memory block can have very confusing behavior. Always be mindful of how many elements are allocated to your memory block. By enabling error terminals on your memory methods, you can detect out of bounds memory accesses.