A Digital Downconverter for the NI 5734

Publish Date: Jul 17, 2019 | 4 Ratings | 5.00 out of 5 | Print | 2 Customer Reviews | Submit your review

Overview

This document shows how to build a digital downconverter for the NI 5734 adapter module for NI FlexRIO. The downconverter decimates the incoming data by a factor of 4, from 120 MS/s to 30 MS/s, resulting in 24 MHz of flat bandwidth.

Table of Contents

  1. Theory
  2. Behavioral Model
  3. Structural Model
  4. Host Simulation
  5. Hardware Co-Simulation
  6. Real I/O
  7. Conclusions

1. Theory

High bandwidth digital downconverters (DDCs) are critical components in many high-performance signal processing systems, including receivers of modulated communications, medical imaging devices, and low-level RF control hardware for scientific research. These applications require both the magnitude and phase of a signal. These signals are typically acquired with high-speed, high-resolution analog-to-digital converters (ADCs), yet their information content does not occupy the entire Nyquist bandwidth of the ADC. In these cases, the sample rate can be reduced, but only after frequency shifting the band of interest down to DC. To preserve both the magnitude and phase of the signal, along with frequencies both above and below the center of the target band, separate components of the signal 90 degrees out of phase must be retained. These are often referred to as the real and imaginary, complex, or in-phase (I) and quadrature (Q) components.

DDCs convert a real, time domain signal into a complex one, centered at baseband. The process of frequency conversion is achieved by mixing – or multiplying – the input signal with a digital sinusoid at the center of the bandwidth of interest. This creates copies of the signal of interest centered around zero, and also at twice the sinusoid frequency. Furthermore, to generate the separate complex components, this mixing is repeated with another sinusoid at the exact same frequency but 90 degrees out of phase. Before the desired sample rate reduction may take place, the DDC must low-pass filter out the positive mixing product as well as any out-of-band signals acquired by the ADC. After filtering, the final step in digital downconversion is decimation, resulting in real and imaginary components of the original signal, centered at DC, and with a reduced sample rate. The entire process is depicted below.

The remainder of this document will use the FPGA DSP design process described in An Introduction to High-Throughput DSP in LabVIEW FPGA to build a DDC for the NI 5734. The NI 5734 adapter module features 4, 16-bit inputs at 120 MS/s. The DDC will downconvert and decimate a single input data stream by a factor of 4, resulting in an I/Q rate of 30 MS/s, with 24 MHz of flat bandwidth.

Back to Top

2. Behavioral Model

LabVIEW provides all of the individual components of a DDC on the host. To generate the real and imaginary tones for the initial frequency shift, the Sine Pattern VI can be used.

Some additional logic is required to translate DDC attributes to the inputs of the Sine Pattern VI.

The mixing processing can be emulated with a simple multiplication, producing a frequency-shifted signal when mixed with a pure tone.

The FIR Filter VI can be used to band-limit the frequency-shifted signals, suppressing the out of band image created by the mixing process. This provides alias protection from the subsequent hard decimation.

To implement a hard decimator, the Decimate 1D Array VI can be used.

While these provide all of the necessary operations to implement the DDC, we must also design the FIR filter coefficients to filter out unwanted signals as well as the image created by mixing. Looking ahead to an eventual FPGA implementation, we should attempt to minimize hardware resources. The primary driver of the FIR size is the slope of the filter roll-off in the transition region. A steeper roll-off results in an increased number of filter taps, which directly translates to increased resource utilization. A good trade-off between performance and size is a transition region between 0.4 and 0.6 times the post-decimation sample rate. For this DDC, the input sample rate is 120 MS/s and the desired decimation is 4 times (resulting in a 30 MS/s sample rate), so the filter should be flat from DC to 12 MHz (0.4 * 30 MHz), transition from 12 MHz to 15 MHz (0.5 * 30 MHz), and provide attenuation above 15 MHz which results in separate I and Q signals, each with 12 MHz of bandwidth (at 30 MS/s), for an equivalent bandwidth of 24 MHz. The performance attribute which is the next largest contributor to FIR size is the amount of attenuation in the stop band. This is determined by the post-DDC dynamic range requirement which is system-dependent. A common DDC dynamic range requirement is 100 dB, though this may vary with ADC resolution, adjacent signals, and the nature of the in-band signal. To generate an FIR filter with these performance parameters, we can use the NI Digital Filter Design Toolkit Configure Filter Design Express VI. Simply enter the desired performance parameters, and the Express VI will create a filter which achieves them.

For the NI 5734 DDC, we see that the filter order is 186, meaning that the FIR filter will have 187 taps.

In addition to the floating point representation of the DDC, we should also model the fixed-point performance of the design. It is common to model the input 16-bit ADC data as a 16-bit signed fixed-point number with 1 bit of integer data, preserving the full resolution of the ADC and providing a range of -1 to 1. It is also common to use the 18-bit input of the 25x18-bit DSP 48E multiplier on the FPGA for the coefficients, so those can be coerced to an 18-bit signed fixed-point number with -1 bits of integer data, as the coefficients are all within the range of -0.25 to 0.25. Finally, the noise performance of the mixer tone when implemented on the FPGA can limit overall DDC performance, so 20-bit mixer waveforms (signed, 20-bit, with 1 integer bit) are necessary to model the eventual FPGA implementation.

Finally, to provide stimulus data for the behavioral model, we can easily create test waveforms with the Analog Waveform Editor.

Several test waveforms are included in the attached files. Furthermore, there are also a few files of actual data captured using the NI 5734 adapter module which can be used for a more realistic simulation.

The overall floating and fixed-point simulation can be found in the NI 5734 DDC by 4 [Host - Behavioral] VI.

Additional math is used to convert the resulting data into the frequency domain for a more visual analysis. Using a 10 MHz test tone with a 15 MHz DDC center frequency, we can see that the floating and fixed point representations match very well.

Only minor distortion products at multiples of the difference between the center frequency and the test tone can be observed. Hiding the fixed point representation, we can see that the floating point representation is almost identical.

Note that the tone at 20 MHz is actually the aliased mixing product (mixed up to 25 MHz, attenuated by the FIR filter, then aliased down to a 5 MHz offset through decimation), though it is attenuated by more than the 100 dB specified for the FIR filter coefficients. For an even more accurate simulation, we can use data previously captured by the NI 5734.

Again, note that the performance of the floating and fixed point representations is nearly identical. Also, overall performance is limited by that of the signal source (noise and distortion, including 20 MHz tone). Additional simulation of signals at different frequencies (transition and stop band), multiple tones, broadband, modulated signals, and other stimulus is necessary to fully validate the design.

Back to Top

3. Structural Model

To build an FPGA implementation of a DDC, we start with the following framework for the FPGA VI, NI 5734 DDC by 4 [FPGA].vi.

The top loop will eventually read real data from the NI 5734 adapter module in its 120 MHz sample clock domain, convert it from unsigned to signed, then transfer to the bottom loop through a local FIFO. The bottom loop runs in its own free-running clock domain at a rate faster than that of the I/O, using a 2-wire protocol for flow control. Furthermore, it has the ability to process simulated data from the host using the False case of the left case structure.

This can be useful for host-based simulation of the structure IP, or hardware co-simulation. The actual digital downconverter is implemented in a subVI, and the output I and Q samples are packed into a 64-bit number before being sent back to the host. To build the DDC, we must replace the host-based operations with similar FPGA IP. Starting with the tones used for mixing, we will use a technique called a numerically controlled oscillator (NCO), provided by the XILINX™ CORE Generator™ DDS Compiler, which provides for customized frequency resolution and dynamic range. A convenient integration VI exists on the Xilinx Coregen IP palette.

For this design, it is configured as follows:

The main configuration options include the sample rate, frequency resolution, spurious free dynamic range, and that both sine and cosine outputs are provided, with the sine negated. The resulting core is imported into LabVIEW FPGA with the following configuration.

It is configured with the following inputs and outputs.

Note that the output valid signal is true only if the input is valid and the initial calculation latency has passed, after which rdy is true whenever input valid is. Finally, this core is encapsulated into a sub VI with a common synthesizer representation.

For the mixers, the high-throughput multiply VI enables the highest possible clock rates as well as 2-wire protocol compatibility.

Note that the data input is 16-bits, as described earlier, while the NCO input is the 20 bits requested above. This would normally create a 36-bit output, though this is far more dynamic range than is necessary, so the output is truncated to 24 bits for compatibility with the Xilinx DSP 48E slice (the full 25 bit input cannot be used as an additional bit is necessary for a symmetric optimization described below, which adds samples (growing the result by 1 bit) before multiplying by a common filter coefficient). This high throughput multiply exists on the diagram as configured below for 2-wire handshaking.

It is encapsulated with a common mixer representation.

Finally, the last piece of structural IP is the FIR filter. The XILINX™ CORE Generator™ FIR Compiler provides a very efficient FIR filter implementation, which includes built-in decimation that does not waste resources computing outputs destined for decimation. It also exists on the Xilinx Coregen IP palette.

In order to use this IP, we must have a means of transferring filter coefficients from the digital filter design toolkit to the FIR Compiler. The included Coefficient Generator.vi does this, writing the coefficients to a Xilinx-compatible *.coe file, as well as a *.xml file.

After this, the core is configured as follows.

 

We see that while there are two filters implemented with 187 filter taps each, the size of the core is only 50 DSP slices. This is because for each filter, the decimation optimization reduces the number of multiplies per incoming data point by 4 times, and a symmetric optimization in the structure of the FIR filter halfs it again. That results in 25 DSP slices per filter, and double that for the separate I and Q filters.

The LabVIEW integration of the FIR core is straightforward, as it includes the signals necessary for 2-wire handshaking.

 

This is encapsulated into a sub VI which represents both the filter and decimation stages.

When combined with the other blocks, it is very simple to assemble a full DDC, which exists in the simulation framework described above.

Note that this representation is very similar to the theoretical DDC representation depicted in the first image of this document. Because the 2-wire protocol has been added, this IP can process data at any rate up to the loop rate in which it resides. Of course, for sample rates other than 120 MHz, filter cutoffs, DDS frequencies, etc. are scaled accordingly – the net operation remains alias-protected DDC by 4.

Back to Top

4. Host Simulation

To confirm that the structural IP matches the fixed-point behavioral model, we can run the FPGA VI on the host before compiling anything. As mentioned in An Introduction to High-Throughput DSP in LabVIEW FPGA, the compilation process can take a significant amount of time, so finding all possible errors before compilation accelerates overall development time. In the LabVIEW project, select execution on the “Development Computer with Simulated I/O” so that the FPGA VI will run in a “dataflow-accurate” simulation on the host PC.

A new host simulation called NI 5734 DDC by 4 [Host – Simulation].vi replaces the fixed point representation with calls into the NI-RIO driver which downloads the test data to the FPGA VI, waits for the simulation to complete, then displays the results.

As the FPGA VI must call into a DLL for simulation of the Xilinx IP, the simulation will take longer than the equivalent behavioral model. For instance, it takes approximately 40 seconds to simulate 4096 points on an Intel Core i7 M 620 CPU. We see that there is a now a difference between the behavioral floating point and structural fixed point representations, which is not surprising as the structural representation uses a different algorithm. This difference, however, is far below the stated performance requirements, so it can be disregarded.

Using the real NI 5734 data from the behavioral simulation, we see better correlation between the behavioral and structural models with data exhibiting realistic dynamic range.

Again, at this point additional simulation of signals at different frequencies (transition and stop band), multiple tones, broadband, modulated signals, and other stimulus is necessary to fully validate the design, and the stimulus data from the behavioral simulation can be re-used.

Back to Top

5. Hardware Co-Simulation

Once the design becomes relatively stable and the structural IP is approaching completion, it may be more efficient to compile the FPGA VI and use it for hardware-accelerated simulations. This is often referred to as hardware co-simulation. Such compilations may occur overnight, out of the design loop. The host co-simulation VI, NI 5734 DDC by 4 [Host Co-Simulation].vi, need only target real hardware instead of a simulated FPGA VI, though we can remove any loops which wait on the simulation to complete and simply request the appropriate amount of data from the FPGA.

While simulating 4096 points on the host took 40 seconds, that same simulation on the FPGA takes a negligible amount of time, limited by the rate at which the host VI can supply stimulus data. This can be convenient for very long simulations. Again, any deviations from the behavioral simulation will not impact real-world performance.

Back to Top

6. Real I/O

Once the structural IP has been completely simulated with both simulated and real-world stimulus, we can incorporate the real-world I/O. The FPGA VI has already been configured for real I/O, with a case structure selecting either real or simulated data. On the host, however, there is no simple way to run the behavioral simulation as the real-time data is not available on the host, so it can be removed. The data is arriving in real time, however, so the VI front panel can be much more responsive.

Note that there are no mechanisms in place to guarantee contiguous data delivery between the FPGA and host, so fetches are limited to the depth of the FPGA side of the FIFO, which will be contiguous provided the entire FIFO is cleared between fetches.

Using a real signal generator (the same one used to capture the test waveforms, which is still the limiting factor in overall system performance), we can view the DDC results in real time.

Note that the fixed point representation / real-world data is now shown in red, and there is no floating point / behavioral representation against which to contrast.

Back to Top

7. Conclusions

In this document we have described how to build a digital downconverter using the DSP design process outlined in An Introduction to High-Throughput DSP in LabVIEW FPGA. Through proper simulation techniques, we can explore performance and design details without needing any hardware, and minimize the number of lengthy compilations to achieve a functional design.

Back to Top

Customer Reviews
2 Reviews | Submit your review

Need update with the lateest FIR compiler  - Jul 28, 2016

This paper is so very helpful to understand the efficient flow of signal processing implementation on FPGA. FIR compiler requires AXI4-Stream interface after Vivado. This whitepaper needs an update for it.

Very Helpful  - Sep 15, 2014

Very Helpful

Bookmark & Share


Downloads

Attachments:

NI 5734 DDC by 4

Requirements


Ratings

Rate this document

Answered Your Question?
Yes No

Submit