LabVIEW Communications 802.11 Application Framework 1.1 White Paper

Updated Jun 15, 2023

Overview

The 802.11 Application Framework provides a ready-to-run, easily modifiable real-time physical layer (PHY) and lower medium access control (MAC)-layer reference design based on the 802.11 wireless standard. The 802.11 Application Framework is available with the LabVIEW Communications System Design Suite, also referred to as LabVIEW Communications.

This application framework provides a substantial starting point for researchers looking for ways to improve the 802.11 standard by exploring brand-new algorithms and architectures that can support the tremendous increase of the number of terminals, inventing new waveforms by which to modulate and demodulate the signals, or finding new multi-antenna architectures that fully exploit the degrees of freedom in the wireless medium.

The 802.11 application framework is comprised of modular PHY and MAC blocks implemented using LabVIEW Communications. It is designed to run on the powerful Xilinx Kintex-7 FPGA and an Intel x64 general-purpose processor, which are tightly integrated with the RF and analog front ends of the NI software defined radio (SDR) hardware.

The framework is designed from the ground up for easy modifiability, while adhering to the main specifications of the 802.11 standard. This design allows wireless researchers to quickly get their real-time prototyping laboratory set up and running based on the 802.11 standard. They can then primarily focus on selected aspects of the protocol that they wish to improve, and easily modify the design and compare their innovations with the existing standards.

Scope
Architectural Overview
Implementation Details
Performance
Conclusion
APPENDIX
Abbreviations
Endnotes
Bibliography

Scope

The 802.11 Application Framework provides functional elements of the physical (PHY) as well as the medium access control (MAC) layer of a single station. This code includes receiver (RX) and transmitter (TX) functionality and functional elements for channel state handling, slot timing management, and backoff procedure handling.

The following subsections describe which PHY and MAC functionalities from the 802.11a and 802.11ac standards are supported by the 802.11 Application Framework version 1.1, as well as the compliance to the 802.11 standards [1] and [2].

1.1 PHY Layer

The 802.11 Application Framework provides the following PHY transmitter functionalities:

Scrambler
Convolutional coding and bit interleaving
BPSK/QAM constellation mapper up to 256-QAM
Pilot sequence generation
Signal fields generation
OFDM symbol generation based on IFFT
Guard interval (GI) insertion
Addition of training fields

Corresponding functionalities, supplemented by specific functions, are provided for the receiver side:

Clear channel assessment (CCA) based on energy detection as well as on signal detection
Packet detection
Automatic gain control (AGC)¹
Time and frequency synchronization
Guard interval removal
OFDM symbol demodulation based on FFT
Channel estimation and zero-forcing equalization
BPSK/QAM demodulation
Deinterleaving
Convolutional decoding based on Viterbi decoding
Descrambling
Signal fields decoding and reconfiguring the receiver based on signal fields information

Notice that the terms “guard interval” and “cyclic prefix” are synonymously used in this document.

The 802.11 Application Framework is designed to support different OFDM configurations as specified in Chapter 18 of [1] and Chapter 22 of [2]. In version 1.1 of the Application Framework, the 20 MHz 802.11a and 20 MHz (VHT20) and 40 MHz (VHT40) channel widths from 802.11ac are supported².

The 802.11 Application Framework follows the PHY frame format as specified in Section 18.3 of [1] for 802.11a and in Section 22.3 of [2] for 802.11ac. Figure 1 and Figure 2 show the formats for 802.11a and 802.11ac, respectively. Fields specifically needed for the 802.11ac format have the name extension Very High Throughput (VHT), and fields specifically needed for the 802.11a format are referred to as Legacy.

PHY Frame Format 802.11a

Figure 1 PHY Frame Format 802.11a

PHY Frame Format 802.11ac

Figure 2 PHY Frame Format 802.11ac

Depending on the subcarrier format, the PHY frame consists of the following fields:

Legacy short training field (L-STF), a static field that is used for packet detection and AGC at the receiver.
Legacy long training field (L-LTF), a static field that is used for time and frequency synchronization as well as channel estimation.
Legacy signal field (L-SIG), a dynamic field that contains information about the applied modulation and coding scheme (MCS) and the frame length.
VHT signal field A (VHT-SIG-A), a dynamic field that contains information about the applied channel bandwidth, MCS, and, if configured on MIMO, multi-user settings.
VHT short training field (VHT-STF), a static field to be used for improving AGC estimation when performing MIMO transmission.
VHT long training field (VHT-LTF), a static field to be used for MIMO channel estimation purposes.
VHT signal field B (VHT-SIG-B), a dynamic field containing information of the frame length and the MCS for single-user or multi-user modes.
Legacy or VHT data (L-DATA or VHT-DATA) is a dynamic field, which is uniquely defined by the MAC message. The number of OFDM symbols NSYM used for the data depends on the chosen MCS and the payload length.

1.2 MAC Layer

The 802.11 Application Framework provides the following MAC transmitter functional elements:

MPDU generation for ACK frames and data frames
Support of A-MPDU generation with a single MPDU as in Section 8.6.1 of [1]
Slot timing management
Backoff procedure
Frame check sequence (FCS) generation

Corresponding functional elements are provided for the receiver side:

FCS check
Address check and frame type handling including triggering of ACK frame transmission
MAC service data unit (SDU) extraction
Channel state handling

These functional elements listed in the preceding list allow the following capabilities:

Basic distributed coordinate function (DCF)³
Transmission and reception of DATA-ACK frame sequences with SIFS timing

as defined in Section 9.3 of [1].

This MAC functionality is implemented on the FPGA and tightly integrated with the PHY to fulfill the requirements for interframe spacing, such as SIFS, DIFS and EIFS, as well as slot timing management to allow frame exchange sequences, such as DATA-ACK and basic DCF for CSMA/CA. Therefore, this functionality is referred to as lower MAC throughout the document.

The 802.11 Application Framework follows the frame structure for the MPDU as defined in Section 8 of [1]. Figure 3 and Figure 4 show the packet structure used for data and ACK frames, respectively.

Structure of a MAC Data Frame

Figure 3 Structure of a MAC Data Frame

Structure of a MAC ACK Frame

Figure 4 Structure of a MAC ACK Frame

A MAC frame consists of the MAC header, the frame body, which may not be present in some frames, and the FCS field.

The MAC header consists of the following elements:
- Frame control:
  - MAC data frame: Type: Data, Subtype: Data
  - MAC ACK frame: Type: Control, Subtype: ACK
- Duration ID: Empty, filled with Zeros
- Address 1: Destination MAC address
- Address 2: Source MAC address
- Address 3: Empty, filled with Zeros
- Sequence Control: Fragment Nr.: 0, Sequence Nr.: 0…4095
- Address 4: Empty, filled with Zeros
- MAC header can be extended to include the fields “QoS Control” and “HT Control.” These fields are not implemented in version 1.1.
The frame body consists of MSDU.
The FCS consists of a 32-bit CRC as defined in Section 8.2.4.8 of [1].

Figure 5 shows the A-MPDU format and also indicates that the 802.11 Application Framework version 1.1 supports A-MPDU with one single MPDU as in Section 8.6.1 of [1].

Format of A-MPDU

Figure 5 Format of A-MPDU

1.3 Compliance to IEEE 802.11 Standard

The 802.11 Application Framework is designed to be compliant to the IEEE 802.11 specifications. To keep the design easily modifiable, the 802.11 Application Framework focuses on the core functionality of the IEEE 802.11 standard.

Besides the standards-compliant implementation of the functionality mentioned in Section 1.1 and 1.2, the 802.11 Application Framework supports the following features:

Long guard interval only
Single input single output (SISO) architecture, ready for MIMO
Support of single user only for 802.11ac applications
VHT20 and VHT40 for 802.11ac applications
A-MPDU with a single MPDU for 802.11ac applications
Initial transmissions only, architecture ready for retransmissions
The backoff value is fixed on the FPGA but configurable during run-time using the host interface

The 802.11 Application Framework deviates in the one way from the IEEE 802.11 specifications. For immediate access to the medium, the medium is sensed not for the DIFS period of 34 µs, but it is sensed for 9 µs x backoff. Because of that condition, the default backoff has been set to 4. For more information about this subject, refer to the known issues list and search for issue ID 524721.

1.4 Known Limitations

The 802.11 Application Framework provides a simplified AGC mechanism. It is partially implemented on the host and therefore needs a longer time period for adjustment. Several received packets are needed by the AGC loop to compute and set the correct gain.

The CCA energy detection (CCA-ED) is based on a simple power measurement at a sampling rate of 80 MS/s. That setup means CCA-ED is performed based on the energy within the available RF bandwidth of the device, regardless of the selected subcarrier format.

1.5 Graphic Formatting

Throughout this document and its subsequent parts, graphic elements are formatted as described in the following table.

Element	Usage
Blue rectangle (rounded edges)	Code block or VI
Blue rectangle	Code entity
Blue arrow	Data path
Yellow arrow	Control path
Red arrow	Reference to element

Table 1 Formatting Used for Graphics

Architectural Overview

This section provides an overview of the 802.11 Application Framework architecture. Section 2.1 provides information about the basic design ideas and their implications for the partitioning between the FPGA and the host. Section 2.3 gives an overview about design considerations, such as 802.11-specific timing requirements, clocking concepts, and the level plan.

2.1 Interface Between MAC and PHY

As the Application Framework targets a wide range of researchers, the design is as close as possible to the IEEE 802.11 specifications.

An implication of this design is that the interface between PHY and MAC follows the PHY_SAP interface as described in Section 4.9 of [1]. Moreover, the Application Framework also follows the concept of PHY service primitives as described in Section 18 of [1] and Section 22 of [2]. For the sake of simplicity, the content of those PHY service primitives is aligned with the implemented feature set.

Figure 6 shows the partitioning within the PHY and lower MAC, which is very close to the proposed concept of the IEEE 802.11 SDL specifications. Refer to section J.5 of [1] for more information about the SDL specifications. The tight timing requirements required modifying the concepts from the 802.11 specifications to fit to the hardware platform. For example, the MPDU generation module is connected to MAC RX and MAC TX directly, which is used for accelerated MPDU assembly that is needed for the ACK frame generation.

Functional Split Between PHY and Lower MAC

Figure 6 Functional Split Between PHY and Lower MAC

The 802.11 Application Framework release 1.1 provides a subset of upper MAC functionality only. Since just a limited set of frame types is supported and no retransmissions functionality is implemented, the protocol control function, which is considered part of the upper MAC, is quite simple. However, the design can be extended to include more functionality, as the interfaces between the layers and the FPGA and host are already available and can be scaled toward more functionality.

An example for the sequential order of PHY service primitives for regular transmission and reception is shown in Figure 7. For version 1.1 of the Application Framework, only primitives that are essentially needed are implemented. An example for a primitive that is not implemented is the PhyTxStart.cnf, which could indicate an error event from PHY to MAC after a PhyTxStart.req is received by the PHY layer.

PHY Service Primitive Sequence Chart for Regular RX and TX

Figure 7 PHY Service Primitive Sequence Chart for Regular RX and TX

2.2 Partitioning Between FPGA and Host

The functionality as described in Chapter 2 and the architectural split described in Section 2.1 are mapped to the host processor, target FPGA, and RF hardware as shown in Figure 8.

The target FPGA contains the lower MAC and baseband PHY layer algorithms for both TX and RX. It also contains RF front-end control functionality taken from the sample streaming projects of the supported RF devices and sample rate conversions to interface baseband signals to the RF front end.

The host processor is dedicated for upper MAC layer functionality, including protocol control. For version 1.1 of the Application Framework, MPDU generation for data frames is implemented on the FPGA. This implementation might change in upcoming releases after LabVIEW Communications supports real-time operating systems.

The data path interface between the host and target is realized using a proprietary message-based interface communication protocol (ICP), which is described in Section 6.2.2. The control path interface between the host and target for baseband configuration is implemented using LabVIEW controls and indicators. RF configuration is completed using target-specific driver VIs.

Functional Split Between Host, FPGA, and RF

Figure 8 Functional Split Between Host, FPGA, and RF

2.3 Design Considerations

2.3.1 Clocking

There are two clock domains within the FPGA design. The first clock domain is for the RF loop. It depends on the target and is referred to as the data clock. For the USRP RIO with 40 MHz bandwidth, data clock is set to 120 MHz. For the USRP RIO with 120 MHz bandwidth, it is set to 200 MHz. For the FlexRIO design, data clock is set to 130 MHz.

The second clock domain is used for baseband processing. This clock rate has to also fulfill the requirements for 80 MHz bandwidth support. For this purpose, a 256-point FFT must run for each OFDM symbol. The Xilinx FFT is set to “Radix-4, Burst I/O” architecture to produce a continuous output of this core with minimum latency. With these settings, the FFT requires 871 cycles for loading data, executing, and unloading data, which are executed sequentially. That process leads to a minimum clock rate of 241.94 MHz, assuming an OFDM symbol duration of 3.6 µs with a short guard interval. Please notice that a short guard interval is not implemented in the version 1.1 of the 802.11 Application Framework, but the design is prepared for the future use of a short guard interval. Hence, the baseband clock is set to 250 MHz for all targets. Based on this clock rate, the computation of each OFDM symbol of 4 µs duration (with long guard interval) can take up to 1,000 clock cycles.

Both clock domains are asynchronous because they do not use the same reference clock. The Application Framework uses FIFOs to transfer data between the RF and the baseband loops. This transfer is straightforward for the RX chain since the samples are taken from RF to the higher rate baseband loop as soon as they are available. The TX chain produces a large number of samples per packet. This large number of samples could lead to overflows when transferring data to the RF loop, which has a fixed sample rate of 80 MS/s. To avoid overflows, the Application Framework generates the TX samples OFDM symbol-wise, and a trigger is generated in the RF clock domain. This architecture means that the FIFO fills at the same rate as it is read from.

2.3.2 Timing Constraints

To ensure efficient use of the shared unlicensed spectrum, the 802.11 standard defines challenging requirements for the interframe timing, in particular for the frame transmissions after a frame reception and for the frame transmissions after channel sensing. To meet those requirements, tight integration of PHY and lower MAC functionalities is needed. In this subsection, the requirements of the 802.11 specifications for systems with OFDM-based PHY layer are described and the values assumed in the 802.11 specifications are compared to actual achieved values of the 802.11 Application Framework.

2.3.2.1 Timing Budget for Transmission after Frame Reception

Transmission after frame reception refers, for instance, to receiving a data frame and transmitting an ACK frame with SIFS. Figure 9 shows such a scenario and the definitions of processing delays used in the IEEE 802.11 standard (refer to [1] Figure 9-14 and Section 9.3.7).

Timing Relationships for Transmission After Frame Reception

Figure 9 Timing Relationships for Transmission After Frame Reception

Table 2 summarizes the requirements of the standard and compares it to the values of the 802.11 Application Framework. The RX and TX PHY delays, aRxPLCPDelay and aTxPLCPDelay respectively, are comprised of processing delays of the I/Q Processing modules and Bit Processing Modules (refer to Sections 3.1.1 and 3.1.2 for more information). It is assumed that separate RF channels are used for RX and TX, and no RX/TX switch is used. Therefore aRxTxSwitchTime and aTxRampOnTime are assumed to be zero. The assumption for D1 being equal to 12.0 µs is calculated from the relation SIFS = D1 + M1 + Rx/Tx.

Name	Content	Assumption of IEEE 802.11 Standard (Section 18.4.4 of [1])	Value for 802.11 Application Framework
D1	aRxRFDelay + aRxPLCPDelay IQ proc. + aRxPLCPDelay bit proc.	~ 12.0 µs	0.68 µs + 2.66 µs + 7.41 µs = 10.75 µs
M1	aMACProcDelay 1	< 2.0 µs	< 2.0 µs
Rx/Tx	aTxPLCPDelay bit proc. + aTxPLCPDelay IQ proc. + aRxTxSwitchTime + aTxRampOnTime + aTxRFDelay = aRxTxTurnaroundTime	< 2.0 µs	0.0 µs + 0.0 µs + 0.0 µs + 0.0 µs + 1.29 µs = 1.29 µs
	Sum	16 µs	14.0 µs*
* MAC TX timing control ensures that interframe-spacing and slot timing requirements from IEEE specifications are met.

Table 2 Timing Budget for Transmission After Frame Reception

2.3.2.2 Timing Budget for Transmission after Channel Sensing

Transmission after channel sensing refers to situations such as transmitting a data frame after a backoff procedure. Figure 10 visualizes such a scenario and the definitions of processing delays used in the IEEE 802.11 standard (refer to [1] Figure 9-14 and Section 9.3.7).

Timing Relationships for Transmission After Channel Sensing

Figure 10 Timing Relationships for Transmission After Channel Sensing

Table 3 summarizes the requirements of the standard and compares it to the values of the 802.11 Application Framework. These requirements assume that D2 + CCAdel = Air Propagation Time + aCCATime, which can be derived from the definitions given in Section 9.3.7 of [1]. aCCATime refers to the time needed for performing the CCA operation.

Name	Content	Assumption of IEEE 802.11 Standard (Section 18.4.4 of [1])	Value for 802.11 Application Framework
Air Prop. Time		< 1.0 µs	Assume for instance 0.2 µs for 60 m distance
aCCATime	CCA detection time	< 4.0 µs	< 2.0 µs
M2	aMACProcDelay 2	< 2.0 µs	< 2.0 µs
Rx/Tx	aTxPLCPDelay bit proc. + aTxPLCPDelay IQ proc. + aRxTxSwitchTime + aTxRampOnTime + aTxRFDelay = aRxTxTurnaroundTime	< 2.0 µs	0.0 µs + 0.0 µs + 0.0 µs + 0.0 µs + 1.29 µs = 1.29 µs
	Sum	9 µs	5.5 µs*
* MAC TX timing control ensures that interframe-spacing and slot-timing requirements from IEEE specifications are met.

Table 3 Timing Budget for Transmission After Channel Sensing

2.3.3 Level Plan and Baseband Operating Points

2.3.3.1 DAC Headroom, ADC Headroom, Signal Power

The D/A converter (DAC) and A/D converter (ADC) should operate in a manner that avoids clipping and saturation of the outgoing and incoming signal, respectively. For proper adjustment of the DAC and ADC operating points, consider the following factors:

OFDM has high peak-to-average power ratio of between approximately 9 dB to 12 dB. That range implies that a DAC headroom of 15 dB would avoid clipping.
On the receiver side, signals on adjacent channels can occur that are 10 dB higher than the desired signal. Having an ADC headroom of 25 dB would also handle such signal constellations without running the ADC into saturation. For example, with an ADC resolution of 14 bits (as for the USRPs) and a noise figure of approximately 9 dB, approximately 8 effective bits available are in the baseband processing, which is sufficient for up to 256-QAM.

2.3.3.2 Reference Signals According to IEEE 802.11a/ac

TX and RX IQ and bit processing and the related submodules of the 802.11 Application Framework follow the power normalization of the test vector generation tool provided by IEEE [3]. The test vector generation tool follows Equation (22-11) of Section 22.3.7.4 of [2] and, hence, the power of the complex-valued baseband signal is normalized to be equal to 1. TX and RX IQ and bit processing and the related submodules of the 802.11 Application Framework are tested against vectors generated by this tool. For any extension toward standard-compliant PHY features, NI strongly recommends that you test against vectors generated with this tool.

2.3.3.3 Complex Fixed Point (Format, Precision)

The mixed-signal processing output data type is a <1.15> fixed-point value. The peak-to-average power ratio of OFDM (refer to Section 2.3.3.1) means that the output must be extended by two bits in the integer part. Because each bit of a complex data type corresponds to 6 dB in signal power, this fact adds a headroom of 12 dB for the numeric representation. The data path interface between the baseband and mixed-signal processing consists of a <3.13> fixed-point value on both the TX and RX path. The conversion is done by reinterpretation of the 16 bits. The <3.13> fixed-point format allows you to compare against the reference signal of [3] (refer to section 2.3.3.2). Furthermore, changing between simulation and operation mode does not require a change in scaling. In general, the precision of the fixed-point logic follows the fixed-point requirements of the implemented algorithms, and it is not optimized regarding to resource usage.

More information about fixed-point formats and related precisions to avoid clipping and saturation can be found in the following sections:

Table 5 of Section 3.1.1.3 referring to synchronization
Table 6 of Section 3.1.1.4 referring to RX IQ processing
Table 12 of Section 3.1.2.6 referring to TX IQ processing

2.3.4 Channelization

The different supported channelization options are illustrated in Figure 11.

For the 802.11ac format, the PHY RX must dynamically switch based on the channel bandwidth information in VHT-SIG-A. This switching is performed by selecting the correct range in the frequency domain without changing the RF front end center frequency, for example, switching between options 2, 4, and 5 in Figure 11. Therefore you must set the RF front end frequency to an 802.11 channel, based on the widest supported bandwidth of the device. Use the channel sets given in the tables in Annex E of [2]. Otherwise, the wrong set of 20 MHz subbands could be concatenated for 40 MHz transmissions with other devices.

Channelization Used in the Application Framework

Figure 11 Channelization Used in the Application Framework

The 256-point FFT of the PHY covers 80 MHz of bandwidth. Based on the 20 MHz bandwidth of the 802.11a signal, the 80 MHz bandwidth can be divided into four subbands of 20 MHz (trapezoids in Figure 11). The parts highlighted in blue are actually used by the PHY, and the rest is filled with zeros. Two controls on the FPGA top level are used to choose one of the illustrated options. The switch control DC centered signal determines if the 802.11 channel center frequency equals the RF center frequency. The numeric control channel selector determines which subband is used as the primary subband. The resulting configuration for each option is listed in Table 4.

Option	Bandwidth	80 MHz Realtime Bandwidth Supported	DC Centered Signal	Channel Selector	Difference Between RF Center Frequency and 802.11 Channel Center Frequency
1	20 MHz	N/A	enabled	N/A	0 MHz
2	20 MHz	yes N/A N/A yes	disabled	0 1 2 3	-30 MHz -10 MHz 10 MHz 30 MHz
3	40 MHz	no	disabled	N/A	0 MHz
4	40 MHz	yes	disabled	0,1 2,3	-20 MHz +20 MHz
5	80 MHz	yes	disabled	N/A	0 MHz

Table 4 Configuration for Channelization Options

Option 1 is limited to the 802.11a format, and the DC-centered switch is enabled. The channel selector control is ignored for 802.11ac applications. The signal is centered within the frequency domain. Set the center frequency of the RF front end to the center frequency of the 802.11 channel as given in Annex E of [1].

All other options correspond to the 802.11ac format and require the DC centered switch to be disabled. The channel selector control selects the primary subband numbered from zero to three, which is used for non-HT reception.

For version 1.1, options 2 and 3 are valid for all RF devices. Set the RF front end center frequency to the center of a 40 MHz wide 802.11 channel. Subbands 1 and 2 are used for non-HT portion of the VHT format preamble transmission as described in Section 22.3.8.2 of [2]. VHT 20 MHz mode uses the primary subband only.

2.3.5 Global Timestamp

The system generates a global timestamp, which is derived from the Data Clock (refer to Figure 8). Its granularity is 0.1 µs, and it is used as a time base of MAC modules and the event tracing.

Implementation Details

3.1 FPGA

3.1.1 RX

3.1.1.1 Overview

The RX baseband is operating in the baseband clock domain of 250 MHz. The RX baseband block diagram is shown in Figure 12. Blue arrows indicate the data path while yellow ones are connected to the control path. Details about the information are available in the following sections.

The data source block selects the source for the receiver. Data can be taken from RF, from the TX baseband using internal loopback, or from the host or by using a host to target FIFO. The stream always has a sample rate of 80 MS/s for all sources. The synchronization detects the packet start and compensates an estimated carrier frequency offset. In parallel, the power measurement block calculates the received signal power. The stream is given to the RX I/Q Processing block, where the samples are transferred to the frequency domain. Then channel estimation, equalization, and phase tracking is done. The constellation with field assignment information is provided to the RX Bit Processing block. Inside this block the modulation is reversed, the bits are deinterleaved, decoded using a Viterbi core and descrambled. This bit stream is given to the RX PHY state machine. This state machine interprets the signal fields, such as L-SIG, VHT-SIG-A, and son on, in the PPDU and generates control information for I/Q processing, bit processing, and MAC. The PSDU, which can be MPDU or A-MPDU, is removed from the bit stream and delivered to the MAC as unsigned bytes. The MAC interprets the header information of the PSDU and transfers data to the Host application using the LabVIEW target-to-host interface.

Every module is designed to keep up with the data rate from the upstream module, so there is no need for throttle control inside the modules. The timing of the transfers is described in the following sections.

Figure 12 RX Baseband Block Diagram

3.1.1.2 Power Measurement

The power measurement module calculates the baseband signal power and the RF input power. The block diagram is shown in Figure 13.

Figure 13 Power Measurement Block Diagram

Based on the incoming samples, x, the signal power, s, is calculated over a window of 64 samples as described in Equation 1. The output of this calculation is updated after 64 samples arrive. The next step is the iterative calculation of the logarithm to the base of 10. The value of s shifts n times to the left until the MSB contains a one. The number of shifts, n, and an LUT of the six MSBs of the shifted value s’ are used to calculate the signal power p in logarithmic scale. This value represents the baseband signal power in dBFS.

Equation 1 Signal Power Calculation

Based on p, the RF input power r is calculated using the power calibration offset and the RF gain (see Equation 2). Both values are given from the host. The analog gain value is subtracted from p because applying gain before ADC means that the RF input power is lower than the measured signal power. The power calibration offset is based on the calibration data of the device. It maps the baseband signal power at minimum gain to the corresponding reference power level⁴ at the RF input port. This mapping is assumed to be linear at all gain levels.

Equation 2 RF Input Power Calculation

The value of r is compared against the given CCA energy threshold. If this threshold is exceeded, the CCA energy detect signal is asserted. Together with the baseband and the RF power, this value is available at the output of the Power Measurement VI.

3.1.1.3 Synchronization

The purpose of the synchronization module is to find the packet start in the continuous sample stream. The ideal position of the packet start for the implemented algorithm is in the center of the L-LTF field also referred as first sample of L LTF 2 (second OFDM symbol of L-LTF).

The block diagram of the synchronization unit is shown in Figure 14 and details of data types, control information, and identifiers used in equations can be found in Table 5.

The synchronization is fed from the data source with a sample rate of 80 MS/s. In the baseband clock domain of 250 MHz, approximately every third sample is valid. Each VI must use the enable chain to update only on valid samples. The data rate is not changed by any VI.

Figure 14 Synchronization Block Diagram

Module	Identifier	Output Data Type	Output Control Information
Autocorrelation	a	UFXP 2.14	-
Timing Metric Calculation	tm	FXP 2.6	-
Timing Metric Peak Search	-	U16 FXP 2.20	Peak index CFO
Timing Metric Valley Search	-	U16	Valley index
Timing Metric Evaluation	-	Boolean U16 FXP 2.20	CCA signal detect Packet start sample index CFO
CFO Removal	-	CFX 3.13	-
Frame Alignment	-	CFX 3.13	Packet start

Table 5 Synchronization Data Types and Control Information

The synchronization block is implemented in two parallel paths (refer to Figure 14) to minimize the latency on the data path. The upper path finds the packet start sample index and estimates the carrier frequency offset (CFO) based on the Schmidl and Cox algorithm [4]. These estimates are used by the lower path, which is the main data path, to compensate CFO and generate the packet start pulse for downstream modules.

For testing purposes, there is a bypass for the synchronization block where the packet start index can be given from the host. This path is not included in Figure 14. Use this bypass in combination with RX samples from the host or internal loopback to characterize the RX baseband without the impact of synchronization algorithms.

As shown in Figure 14, the upper path of the synchronization block starts to calculate the autocorrelation of the received signal x (see Equation 3). As the length of one period of the non-HT short training field is 64 samples at an 80 MS/s sample rate (refer to Section 18.3.3 in [1]), the length of autocorrelation window CP (see Equation 3) is set to 64. The normalized magnitude and the phase of the autocorrelation window, s, are given at the output of the autocorrelation module for each sample. Under ideal conditions, this autocorrelation scheme results in a normalized magnitude equal to 1 as shown in Figure 15.

Equation 3 Synchronization Autocorrelation

Figure 15 Simplified Signal Charts of Synchronization

To find the transition from L-STF to L-LTF, a so-called synchronization timing metric is calculated based on the magnitude of the normalized autocorrelation, as shown in Equation 4. The ideal behavior of this timing metric, tm, is also shown in Figure 15.

Equation 4 Synchronization Timing Metric

Based on the indices of the minimum and maximum value of this metric and the distance between minimum and maximum, the sample index of the packet start is calculated using the following steps:

The timing metric peak search looks for the valley of the timing metric. This valley is 2 * CP samples after the start of L STF. A valley is found if a given number of samples is under a defined threshold. The sample index of the center of all samples being under the threshold is given by the valley search as valley index. A 16-bit wide wrapping counter is used to give that index. The phase of the timing metric is captured on the first value that exceeds the threshold.
A peak search is also used for the timing metric. Under ideal conditions, this index is CP samples after the start of L LTF. It uses the same principle as the valley search.
Those two search results are combined in the Timing Metric evaluation. This module checks the distance between valley and peak, which has to be within a given range of 9 * CP ± CP. As an additional check, the autocorrelation value must be under a given threshold. This check is valid for a packet start because during L-LTF the autocorrelation reports a low value.

Based on this algorithm and corresponding processing delays, the packet start sample index is calculated and given to the Frame Alignment module. During the calculation, the number of samples to cut into the OFDM guard interval is taken into account.

In addition to timing estimation, the phase of the autocorrelation is averaged over CP values and used for CFO estimation. This CFO estimation is based on the phase output of the peak search.

In the lower path of the synchronization block, the estimated CFO is compensated by applying a digital frequency shift. The CFO estimate is used for all OFDM symbols of the entire packet. The Frame Alignment module generates the packet start trigger pulse at the sample index given by the timing metric evaluation.

After the synchronization has indicated a packet start signal, further triggering of packet start signals is blocked until the synchronization is rearmed by the PHY RX end indication, generated at the end of the packet. This blocked status is indicated by the asserted CCA signal detect signal.

Figure 16 Synchronization Latency

The latencies for the different modules in the Synchronization block are illustrated in Figure 16. The left part of the figure contains the modules of the upper path. The latency of those modules sum up to 68 clock cycles. Given the sample rate of 80 MS/s at a 250 MHz clock rate, this time is equivalent to about 22 samples. Since the peak of the timing metric is located 256 samples before L-LTF-2, the packet start index is calculated before the packet start signal is asserted and there is no effective delay.

The latency of the lower data path is shown in the right part of Figure 16. This latency increases the length of the RX processing path by 15 clock cycles.

3.1.1.4 RX IQ Processing

The receiver RX IQ Processing block purpose is to restore the transmitted I/Q constellation. The block diagram is shown in Figure 17. Details of data types, control information, and identifiers used in equations are presented in Table 6.

Figure 17 RX IQ Processing Block Diagram

Module	Identifier	Output Data Type	Output Control Information
Synchronization	-	CFX 3.13	Packet Start
Sample Generation Timing	-		Sample Timing
Cyclic Prefix Removal	-		Sample Timing
FFT	R	CFX 4.21	OFDM symbol index
Demapper	-		Field Map Subcarrier Timing
Channel Estimation	H_est		Subcarrier index
Channel Equalization	Y_est	CFX 2.14	Field Map Subcarrier Timing
Pilot Phase Estimation	ß	FXP 1.14	-
Phase Correlation	X_est	CFX 2.14	Field Map Subcarrier Timing
VHT-SIG-A2 Rotation	-	CFX 2.14	Field Map Subcarrier Timing

Table 6 RX IQ Processing Data Types and Control Information

The Sample Timing Generation module gets samples from the Synchronization module along with the packet start index. It starts passing samples to downstream modules as soon as the packet start signal is asserted. It stops passing samples as soon as the last OFDM symbol is finished whose index is given by the RX PHY state machine. The control information is carried by the sample timing cluster, which contains the following elements:

OFDM symbol index
Sample index (within the OFDM symbol; 0 to 319)
Packet start flag
OFDM symbol start flag
Valid flag

The sample index is used by the Cyclic Prefix Removal module to invalidate the first 64 samples of each OFDM symbol.

The next downstream module is the FFT, which is a wrapper for the Xilinx FFT core. It contains a 256-point FFT operation using a Radix 4, Burst I/O architecture. A toggling negation realizes the FFT shift to have the DC at the 128th output value. The FFT starts execution as soon as 256 samples are provided. During the execution, no samples are taken on the input. A FIFO is placed before the input to capture the samples that arrive in the meantime. On finishing execution, the 256 subcarriers are provided at the output consecutively. The OFDM symbol index from the incoming sample timing cluster is passed through this module, parallel to the data stream. The maximum gain of the FFT is 256 if the energy is limited to only one subcarrier. Therefore, the fixed point data type is extended by nine bits to capture this output dynamic range of the FFT module. The output of the FFT is divided by 256 to have the same scaling as on the input of the IFFT in the transmitter chain. The resulting fixed point format is <4.21>.

The Demapper block aligns two control information clusters with the data stream. The first cluster is the subcarrier timing cluster, which contains the following elements:

OFDM symbol index
Subcarrier index (0 to 255)
Frequency offset index (named k in [1] and [2]; -128 to 128)
OFDM symbol start flag
Valid flag

The frequency offset index is generated based on the control information from the RX PHY state machine (refer to section 0). The second control information cluster is the field map. This cluster is made up of Booleans, and each Boolean represents one field of the 802.11 packet structure, such as L-SIG, L-LTF, VHT-SIG-A, pilot subcarrier, or data subcarrier. Similar to a one–hot–code, only one of these Booleans is asserted for each sample. The packet structure is known to the Demapper module. Downstream modules can take this field map to filter for specific fields, such as the pilot subcarriers.

The channel estimation is computed using the second L-LTF OFDM symbol for 802.11a and VHT-LTF for 802.11ac. The inverse channel transfer function is calculated for each subcarrier R individually using the L-LTF definitions L from section 22.3.8.2.3 of [2] as shown in Equation 5. The signal names are included in Figure 16. The frequency offset index k from the subcarrier timing is used. The channel estimation block is implemented in a parallel path to minimize latency to the data path. The values of H_est are given to the channel equalization module where they are stored in memory. They have the same data type as the incoming subcarriers. Beginning with the L-SIG, the channel equalization uses those values to apply zero forcing to get signal Y_est. The fixed point format of <2.14> is sufficient to represent the values of Y_est. Larger values are saturated.

Equation 5 Channel Estimation and Compensation

The signal Y_est is passed to the pilot phase modules that follow the same structure as the channel estimation and equalization. Removing the cyclic prefix residual carrier frequency offset after the synchronization leads to a phase jump between consecutive OFDM symbols. The phase for the current OFDM symbol αn is calculated based on the pilot sequences P. These sequences are taken from Section 18.3.5.10 of [1] and Section 22.3.10.10 of [2] at the frequency offset index k. The phase offset between OFDM symbols is compensated by adding the difference to the last phase estimation from OFDM symbol n - 1. The estimated phase ß of OFDM symbol n is applied to the OFDM symbol n + 1 by the Phase Correction module. This operation does not change the magnitude of the values, so the fixed point format is kept.

Equation 6 Phase Estimation and Compensation

As the last step of the RX I/Q processing, the clockwise rotation of VHT-SIG-A2 (refer to section 22.3.4.5 of [2]) is reversed.

Module	Output Timing
Sample Timing Generation	~ 1 sample / 3 clock cycles (320 samples per OFDM symbol)
Cyclic Prefix Removal	~ 1 sample / 3 clock cycles (256 samples per OFDM symbol)
FFT	256 subcarriers / OFDM symbol burstwise
Demapper
Channel Estimation
Channel Equalization
Pilot Phase Estimation	1 phase estimate / OFDM symbol
Phase Correlation	256 subcarriers / OFDM symbol burstwise
VHT-SIG-A2 Rotation	256 subcarriers / OFDM symbol burstwise

Table 7 RX IQ Processing Transfer Timing

The timing of the data stream is changed inside the RX I/Q Processing module. A summary for all submodules is given in Table 7. The input is given by the digital downconversion at a sample rate of 80 MS/s. The Cyclic Prefix Removal module removes 64 samples from the stream. Because of the chosen FFT architecture configuration the output of the Xilinx core is given burstwise. This transfer timing is kept for all downstream modules. The only exception is the Pilot Phase Estimation module that computes one phase estimate per OFDM symbol.

Figure 18 RX IQ Processing Latency

The overall latency of the RX I/Q Processing module for the last sample of the OFDM symbol is 665 clock cycles (refer to Figure 18). The FFT latency is smaller than reported by the Xilinx IP Generator, and this latency includes loading of all 256 samples. During the packet, the FFT executes and unloads samples in 617 clock cycles after the last sample arrived. The remaining clock cycles per OFDM symbol are used to transfer data from the input FIFO to the FFT core. By the time the last sample is available on the input, the FIFO is empty, and it is passed to the core as fast as possible. The delay of the FIFO is unknown, which is indicated in Figure 18. All other modules have a fixed latency.

3.1.1.5 RX Bit Processing

The RX bit-processing chain deinterleaves, decodes, and descrambles the data. It provides the received bits to the RX PHY state machine and the PSDU bytes to the MAC. The block diagram is shown in Figure 19. Details about data types and control information are given in Table 8.

Figure 19 RX Bit Processing Block Diagram

Module	Output Data Type	Output Control Information
RX IQ Processing	CFX 2.14	Field Map Subcarrier Timing
Packet Termination		Field Map Subcarrier Timing
Align Configuration		Bit Processing Configuration
LLR Demapper	FXP8.0 Array (8 elements)
Softbit Serializer	FXP8.0
Deinterleaver	FXP8.0
Depuncturing	FXP8.0, Boolean Array (2 elements)
Viterbi	Boolean
Descrambler	Boolean
PSDU Masking	U8	-

Table 8 RX Bit Processing Data Types and Control Information

The first module of the chain is the Packet Termination module. It passes all samples that have OFDM symbol indices in the subcarrier timing cluster below the value given from the RX PHY state machine. Passing only these samples ensures that the packet end is correctly processed. If you abort the current packet reception, this module terminates all I/Q data by setting the last OFDM symbol index to 0.

The next block is the Align Configuration module, which has two functions. The first function is to align the bit-processing configuration cluster from the RX PHY state machine with the start of a new OFDM symbol. All other control information is terminated in this module. The bit-processing configuration is transferred parallel to the data stream, and it contains the following information:

Packet format
Bandwidth
Modulation
Coding rate
PSDU length (in bytes)
Valid bits in current OFDM symbol
Descrambler enable flag
Viterbi flush required flag

The second function is the filtering of all noncoded fields for downstream modules. It uses the field map provided by the RX I/Q processing chain.

The I/Q samples in the coded fields are processed by the LLR Demapper block. Based on the given modulation scheme, an array of up to eight softbits is given at the output. The data type of each softbit is unsigned 8-bit integer.

The Softbit Serializer module takes this array of softbits and provides the serialized stream on the output. The number of valid softbits in the array is derived from the modulation. An internal FIFO is used to buffer softbits on the input.

The Deinterleaver module reverts the BCC interleaver operations defined in section 18.3.5.7 of [1] and section 22.3.10.8 of [2]. The write operation into the memory is based on equation 22-82 of [2], which reverses the second permutation. The read operation is based on equation 22-77 of [2], which reverses the first permutation. Reading is started as soon as all softbits of the current OFDM symbol are saved to memory. A double page memory is used, which enables reading and writing at the same time.

Based on Figure 18-9 and 20-11 of [1], the Depuncturer module converts the incoming bit stolen data sequence to the bit inserted data sequence. Each bit gets a puncturing flag attached depending on whether it was transmitted or left out. One element of A and the corresponding element of B are combined into an array of two elements, where A and B are as defined in Figure 18-9 and 20-11 of [1].

The array is given to the Viterbi decoder, which is a wrapper for the Xilinx Viterbi core. The softbits are converted to the Xilinx format before passing them on to the s_axis_data_tdata input. The s_axis_data_tuser input is used to provide the punctured flags from the Depuncturer module and the block valid flag. The block valid has the same latency as the data path. After the last softbit of the current code word, the Viterbi is flushed to get the remaining bits out of the core. For flushing, strong zeros are pushed to the input with the block valid set to FALSE. On the output of the core, the block valid information can filter out the zeros from the flushing operation. The bits of the code word are provided at the output.

The Descrambler module processes the bits at the output of the decoder. If the scrambler is disabled, the input bits are bypassed to the output. On activation, detected by the rising edge of the enable signal, the Descrambler module assumes it is receiving a packet starting with the SERVICE field and uses the first seven bits to extract the scrambler seed. Those initial bits are overwritten by zeros. Afterward, all bits are descrambled with the recovered seed until deactivation.

The output is transmitted to the RX PHY state machine. Before sending to the MAC, the bit stream is filtered by the PSDU Masking module. The SERVICE, TAIL, and PAD fields are removed, and the bits are concatenated to bytes. The length of the PSDU is given by the configuration. Padding bits are removed. For the 802.11ac format, parts of the PAD field may be included in the PSDU data stream (refer to Section 3.1.1.6 for more information).

Module	Output Timing
Packet Termination	256 subcarriers burstwise / OFDM symbol
Align Configuration	N_SD data subcarriers / OFDM symbol (48-108) burstwise; gaps due to pilots
LLR Demapper
Softbit Serializer	N_CBPS coded softbits / OFDM symbol (48-864) burstwise; gaps due to pilots
Deinterleaver	N_CBPS coded softbits / OFDM symbol (48-864) burstwise
Depuncturing	N_DBPS encoded stream values or data bits / OFDM symbol (24-720) Peak rate: 1 value / clock cycle
Viterbi
Descrambler
PSDU Masking	N_DBPS/8 data bytes / OFDM symbol (3-90) Peak rate: 1 byte / 10 clock cycles

Table 9 RX Bit Processing Transfer Timing

The output timing of the submodules is given in Table 9. The number of values depends on the format, bandwidth, and MCS. The referred variables can be found in Table 18-4, 18-5 of [1] and Table 22-30, 22-38 of [2]. In brackets, the minimum and maximum values are given indicating the valid range. The minimum value is based on L-SIG, which uses non-HT mode with MCS 0. The maximum value is based on VHT 40 MHz transmissions using MCS 9.

The RX I/Q Processing module provides 256 subcarriers in one burst. The first module that changes this pattern is the Configuration Alignment. Only subcarriers belonging to coded fields remain on the output. Since there are multiple pilot tones, this stream contains gaps. The serialized stream on the output of the Softbit Serializer module can have much more valid items per OFDM symbol. Nevertheless the pilot gaps remain if you are using BPSK modulation, where each subcarrier is translated to one softbit by the LLR Demapper. The gaps are gone after the Deinterleaver module because the softbit stream is read burstwise from the internal memory. The Depuncturer adds gaps to this data stream when there are two valid bits of stream A and B available. Adding punctured bits does not produce gaps. The Xilinx Viterbi core generates data on the output as soon as new bits are provided to the input. As a result, the output pattern is not changed. The masking of the PSDU reduces the data rate by factor 8. At a coding rate of 5/6, the peak rate is reached.

Figure 20 RX Bit Processing Latency

The latency of the RX Bit Processing chain depends on the format, bandwidth, and MCS. Similar to Table 9, Figure 20 refers to the two corner cases L-SIG and highest MCS at highest bandwidth. The latency is given for the last subcarrier of the packet generated by the RX I/Q Processing module. Most of the modules have a fixed latency.

The delay of the Softbit Serializer depends on the modulation. For BPSK, each subcarrier is mapped to one softbit so the serialization does not add any delay. The internal FIFO is empty when the last value arrives. The FIFO delay is unknown. The latency is 2 because of internal registers. For 256-QAM, each softbit array has to be split into eight softbits on the output. When the last value arrives, 108 (N_SD) of 864 (N_CBPS) softbits are processed on the output. The delay for the last softbit added with the two register stages results in 758 clocks latency.

The Deinterleaver has to store one complete OFDM symbol of softbits. The read operation starts as soon as the last value arrives. N_CBPS softbits have to be read before the last sample is available on the output of the Deinterleaver. An additional latency of 11 is incurred because of the pipeline stages.

The latency of the Viterbi decoder is determined by the Xilinx Viterbi decoder core. An additional latency of one is incurred due to one pipeline stage.

The latency for other configurations can be calculated using Equation 7 with values from Table 18-4, 18-5 of [1] or Table 22-30, 22-38 of [2].

Equation 7 RX Bit Processing Latency

3.1.1.6 RX PHY State Machine

The RX PHY state machine, which is based on Figure 22-37 of [2], provides the configuration for RX IQ and RX Bit Processing modules and generates indications for the MAC. Notice that the synchronization is controlled indirectly by the state machine. The RX end indication is used to rearm the synchronization. Notice also that the PHY is not capable of decoding VHT MU PPDUs, so the reception of VHT-SIG-B is skipped as described in section 22.3.21 of [2]. The state diagram is given in Figure 21. The word timing in this diagram refers to the timestamp when the last sample of the packet was received, which is included in the PHY RX end indication.

Figure 21 RX PHY State Machine States

Initialization

This is the startup state. In this state, the internal configuration is reset such that it can receive the first coded field in the packet (L-SIG in the primary subband). This setting consists of the 802.11a format, 20 MHz bandwidth, disabled scrambler, and MCS 0. The unknown length of the packet means that the last OFDM symbol index is set to the maximum unsigned 16-bit integer value of 65,535.

RX L-SIG

As soon as the synchronization detects a packet, the processing chain uses the configuration from the Initialization state to provide the 24 bits of the SIGNAL field to the RX PHY state machine. The received bits are verified to be a valid L-SIG field based on Section 18.3.4 of [1]. The L SIG check includes verifying the following conditions:

R4 of RATE field is one
Bit 4 is zero
SIGNAL TAIL field is all zeros
Parity bit is matching
LENGTH > 0

The result of the check is used as condition L-SIG valid in the state machine. As soon as this condition is evaluated the state machine leaves this state.

If L-SIG is invalid the reception of the current packet is aborted. The last OFDM symbol index is set to 0. This forces the Sample Timing Generation module of the RX I/Q processing chain and the Packet Termination module of the RX bit processing chain to finish the current OFDM symbol and stop. Because there is no packet length information available at this point in time, the timing information is marked as invalid. Furthermore the internal format violation flag is set.

If a valid L-SIG was received, the next state depends on the packet format selected from the host. In the 802.11a format, all necessary information is available from L-SIG interpretation to start reception of the data. The index of the last OFDM symbol is calculated based on equation 18-11 of [1]. This index as well as MCS and PSDU length are provided to the processing chain. In addition, the PHY RX start indication is generated, and the packet frame timing is set to valid.

If the L-SIG was valid and the format is set to 802.11ac, further information from VHT SIG A is needed to decode the packet. Only the index of the last OFDM symbol can be calculated, which also results in a known packet timing. The format still remains 802.11a because the VHT SIG A is coded like an L-SIG with MCS 0.

RX VHT-SIG-A

Similar to the RX L-SIG state, the processing chain is configured to provide the bits of the VHT-SIG-A to the RX PHY state machine. The code word of VHT-SIG-A is provided in two OFDM symbols. The Viterbi decoder flush required flag in the bit processing configuration cluster is set for the second OFDM symbol. This bit-processing configuration cluster is aligned with the data stream in the RX Bit Processing block by the Align Configuration module (see Section 3.1.1.5). Hence, accurate indication of the current OFDM symbol index is available from the RX Bit Processing module and can be used to set the Viterbi decoder flush required flag.

The 48 bits of the VHT-SIG-A are captured, and its validity is verified based on Section 22.3.8.3.3 of [2]. The condition VHT-SIG-A valid is based on the following checks:

Bandwidth is supported by PHY
Group ID indicates VHT SU PPDU (0 or 63)
Short GI is set to zero (disabled)
B2 is set to zero (BCC encoding)
CRC checksum is matching

If VHT-SIG-A is invalid, the reception is aborted, similar to what happens in the L-SIG state when the timing information is known from a successful L-SIG reception.

If VHT-SIG-A is valid, you can configure the bandwidth, format, MCS and PSDU length in the processing chain. Since there is no specific length information given in VHT-SIG-A, the PSDU length is calculated using Equation 22-112 of [2]. This PSDU length is greater or equal to the exact payload size. Padding bits are included in the PSDU and given to the MAC.

Wait for last sample

This state can abort a running reception when an invalid signal field occurs. The state machine waits until the Sample Timing Generation module of the RX I/Q Processing module indicates that the last sample of the current OFDM symbol has been processed. The global timestamp is captured at this point in time to provide the end of the packet as the new frame timing. The PHY RX end indication is generated using the internal information about format violation and timing validity in addition to this new frame timing.

End of PSDU RX

This field is entered when the signaling information was correctly received and the data field is to be decoded. Similar to the RX VHT-SIG-A state, flushing the Viterbi decoder is enabled only for the last OFDM symbol of the packet, which is identified by the last OFDM symbol index computed in RX L-SIG state. Furthermore, in 802.11a format, the number of valid data bits in the last OFDM symbol before tail and padding is known and configured to the Viterbi module so that the TAIL bits are the last to be decoded. For 802.11ac format, the padding is inserted before tail bits. The valid bits limitation is not used in this case. All bits of the last OFDM symbol are processed by the Viterbi decoder.

The state is left as soon as the PSDU Masking module in RX bit processing indicates that the last byte of PSDU has been decoded. A PHY RX end indication with the frame timing information is generated. Similar to the wait for last sample state, the frame timing is based on the global timestamp captured when the last sample of the packet was processed in Sample Timing Generation module in RX I/Q Processing.

Wait for packet end

The state is left after 1000 clock cycles, which is the duration of one OFDM symbol. This waiting period is required because either the RX IQ processing or RX bit processing chain or both could still be working on samples that must be terminated before setting the configuration for a new L-SIG reception. Since the processing is based on OFDM symbol boundaries, after the duration of one symbol, all modules are in idle state.

3.1.1.7 Timing

The overall timing of the RX chain including Synchronization, I/Q and bit processing, and the RX PHY state machine is shown in Figure 21 for 802.11a packets. Time is represented on the horizontal axis. On the vertical axis, several selected modules with important outputs or that change the transfer timing are displayed. The colored rectangles correspond to the data values of one OFDM symbol. The size and the placement among the time axis are related to the latencies and transfer timings of the modules. The black arrows show important control signals between processing chain and state machine and between PHY and MAC. The arrows are based on the timing information. Neither the start nor the end position has to be related to the module that generates or consumes this control information.

Figure 22 RX PHY Timing for 802.11a Packets

Figure 22 shows the timing of the receiver for an 802.11a packet with MCS 7 and N_SYM=3. RF and Synchronization add the latency between over-the-air transmission and the synchronization output. The first OFDM symbol after the packet start is L LTF¬ 2. The RX PHY state machine has configured the RX IQ and Bit Processing modules to receive L SIG.

As soon as the last sample of the L-SIG field is available in the FFT, the execution starts. The burstwise unloading of data is done in parallel to reception of the next OFDM symbol on the FFT module input. L-LTF-2 is terminated in the Align Configuration module of the RX Bit Processing block.

As the first dynamic field, the L-SIG is the first field handled by the RX Bit Processing. L-SIG uses MCS 0, so it has only 24 data bits, and the latency is much smaller than one OFDM symbol duration. The decoding and flushing of the Viterbi decoder takes most of the time. The RX PHY state machine can update the configuration cluster for the reception of the coded data symbols based on the L-SIG field contents long before the next OFDM symbol is unloaded by the FFT.

Starting with L-DATA-1, the RX Bit Processing chain uses MCS 7. This results in larger amount of bits on the Softbit Serializer module output. The Viterbi decoder keeps about 200 bits stored due to the internal latency. Because of this storage, the output of the RX Bit Processing chain is not given OFDM symbol wise. All other DATA symbols are similar in timing. The Viterbi decoder output starts off with the remaining bits from the previous OFDM symbol.

The code word ends in the last OFDM symbol, and the Viterbi decoder is flushed. Due to padding bits, the decoding can end before the last bit has been received. The PSDU Masking module notifies the RX PHY state machine to send out RX PHY end indication with the timestamp of the last sample of packet and goes to wait for packet end state. The RX PHY state machine remains in this state to terminate the remaining bits of the last OFDM symbol out of the RX Bit Processing chain. After the Initialization state, the RX chain is ready to process a new packet.

Figure 23 RX PHY Timing for Invalid Packets

Figure 23 illustrates the termination of the reception in case L-SIG was not valid. An invalid VHT-SIG-A is handled similarly. Like in Figure 22 L-SIG is provided to the RX PHY state machine. Once it is determined that L-SIG field contents are invalid, the last OFDM symbol index is set to zero, and the state machine goes to wait for last sample state (shortened to wait in Figure 23).

At this point in time the FFT is filled with data from the next OFDM symbol and cannot be aborted immediately. The Sample Timing generation module in RX IQ Processing block completes the current OFDM symbol and notifies RX PHY state machine after the last sample. RX PHY state machine switches to wait for packet end state and waits for the duration of one OFDM symbol. During this time the FFT unloads the remaining data. This data is terminated in the Packet Termination module of the RX Bit Processing chain.

Figure 24 RX PHY Timing for 802.11ac Packets

The reception of an 802.11ac packet with a bandwidth of 40 MHz at MCS 9 is shown in Figure 24. Since the process before L-SIG is equal to Figure 22, it is left out. After the L-SIG field, the RX PHY state machine switches to RX VHT-SIG-A state and updates the format to 802.11ac. The Demapper module in the RX I/Q Processing chain enumerates further subcarriers for 802.11ac. The timing of VHT-SIG-A reception is similar to L-SIG in RX IQ and Bit Processing chain. Because the VHT-SIG field only has a small number of bits and the Viterbi code is flushed at the end of VHT-SIG-A2, the 48 bits arrive in one bust at the RX PHY state machine.

If VHT-SIG-A is determined to be invalid, the reception would be aborted similar to the L-SIG invalid case illustrated in Figure 23. In this case, VHT-STF would be the last OFDM symbol getting out of the FFT.

If VHT-SIG-A is valid, the parameters bandwidth and MCS are obtained from the field and used to set the configuration clusters for the processing chain. PHY RX start indication is sent to MAC and the RX PHY state machine transitions to the End of PSDU RX state and waits for end of decoding.

The next OFDM symbols contain training sequences and VHT-SIG-B. This information is not handled in RX Bit processing chain.

In this example, the RX Bit Processing is for MCS 9, which consists of 256-QAM modulation. This scenario results in a large number of bits generated by the LLR Demapper, which are serialized by the Softbit Serializer. Reading and writing the Deinterleaver memory overlaps for this large number of bits is the reason for having a double page memory in this module. The Viterbi decoder core is flushed on the last OFDM symbol as in 802.11a format. RX PHY end indication is sent by the state machine if the last byte has been provided to MAC.

3.1.1.8 MAC RX

The module MAC RX implements low-level latency-critical MAC reception functionality, i.e. validation and recognition of received packets and triggering of ACK responses. Input to the module are SDUs delivered from the PHY together with associated control information. The MAC RX module performs frame validation, consisting of subframe detection for packets received in 802.11ac format and the FCS check for all received MPDUs. Subsequently MPDU type recognition is performed. For supported frame types, MAC header evaluation and address filtering is done. Finally MSDU extraction is performed by a configurable filter operation. In addition to these packet-handling related functionalities, the module MAC RX also handles CCA information from the PHY RX and forwards frame timing information from the PHY RX to the MAC TX.

As shown in Figure 25, the module MAC RX consists of five major submodules:

A-MPDU Frame Validate
MPDU Validate
MPDU Recognize
MPDU Filter
Channel State

All five submodules are described in more detail in the following sections. Notice that the overall internal structure of MAC RX roughly follows the concept of the IEEE 802.11 SDL specifications. Refer to Section J.5 of [1] for more information about the SDL specifications.

Figure 25 MAC RX Block Diagram

The module A-MPDU Frame Validate is only applicable for packets received in 802.11ac format. It checks the MPDU delimiter and provides the contained information to subsequent modules. For received packets in 802.11a format, the module passes all data through without any change. For the 802.11 Application Framework version 1.1, this module can handle only A- MPDUs with one A-MPDU subframe, as in one MPDU.

The module MPDU Validate performs frame validation by means of the FCS field. The FCS check is done based on IEEE 32-bit CRC as specified in [1] Section 8.2.4.8. During the check, the 4 FCS bytes are removed. The module implements a small state machine to extract control information, such as the frame end timing validity and value, from the MPDU start indication primitive and the PHY RX end indication primitive. This information is collected in the RX info indication and forwarded to the Channel State module.

The module MPDU Recognize detects the frame type of received MAC PDUs. For supported frame types, MAC header evaluation is executed, including destination MAC address check (for all frames) or Source MAC address extraction (for frames with address field 2, such as data frames). Currently supported frame types are Data and ACK. For Data frames received with correct FCS and a matching address, an ACK transmission request is generated and forwarded to MAC TX.

The module MPDU Filter implements a configurable filter operation on received MPDUs. The filter can be configured to block MPDUs with FCS error, address mismatch or unsupported frame type. The filter also allows removal of MAC headers. The filtered received data and control information is converted into a serial data stream for transferring it using a target-to-host FIFO to the host.

The default filter configuration for 802.11 Application Framework version 1.1 is as follows:

Remove header: TRUE
Block unsupported frame types: TRUE
Block FCS errors: TRUE
Block address mismatch: TRUE
Block header recognize error: TRUE

As a result of this configuration, only the frame body of received Data MPDUs with correct FCS and matching address is sent to the host. For the 802.11 Application Framework version 1.1, this is sufficient since no further MAC operation is implemented on the host. If you want to perform MAC operations on the host, the filter configuration has to be adapted. If for example the MAC header field Duration is evaluated on the host, remove header must be set to FALSE. Then the MAC header of ACK and Data frames are forwarded to the host. After evaluating the Duration field, the host completes MAC header removal for Data frames.

The module Channel State gathers CCA status information, including energy detection and signal detection, and information about received frames, including frame end timing validity/value, and DIFS/EIFS indicator, and then it provides it to MAC TX.

3.1.2 TX

3.1.2.1 Overview

The TX baseband operates in the baseband clock domain of 250 MHz. Its block diagram is shown in Figure 26.

The data source block selects the source for the transmitter. Data is always taken from a host to target FIFO, or you can disable this feature when the TX MAC is bypassed. The TX MAC accepts TX requests to start after SIFS (ACK frames) or after backoff procedure (data frames). It multiplexes the requests in priority order and generates a TX start request for the PHY. The TX Bit Processing module serializes the bytes received from the TX MAC and scrambles, encodes, punctures, and interleaves these bits. The TX I/Q Processing module modulates the bits according to the settings of the TX vector, which is defined in the IEEE specifications and collects all TX parameters. The module furthermore applies channel duplication and rotation as needed for 802.11ac format. The modulated bits are then translated into the frequency domain using IFFT. The resulting I/Q samples are transferred to the TX to RF FIFO, which is an internal loopback FIFO for operation without RF and a target to host FIFO for debugging purposes (refer to Figure 26).

Figure 26 TX Baseband Block Diagram

3.1.2.2 MAC MPDU Assembly

The current design is able to generate ACK frames and A-MPDU frames in the format described in Section 1.2. The frames consist of a header and an optional body. The header and body form the MPDU block, which is completed by the FCS in the MAC TX to a MPDU.

You can use the TX Data Source and TX MAC Bypass modules to bypass all MAC TX processing. When bypassed, all bytes from the T2H TX Data FIFO are streamed directly to the PHY TX. The host ensures that the FIFO contains valid MPDU data.

The header generation modules utilize the frame configuration cluster, which contains all supported header fields. This cluster is serialized in MAC Frame Header Generator into a continuous byte stream.

An ACK frame is generated with the help of the MPDU generation request register filled by the MAC RX. As the ACK carries no body, only the header must be serialized. When the ACK frame is ready to send, a TX after SIFS request (see Section 3.1.2.3) is triggered.

To generate a DATA frame, an ICP TX message (see Section 6.2.2) is decoded from the T2H TX Data FIFO. The header of the ICP message is transferred into a frame configuration, which is then used to create the A-MPDU block byte stream along with the payload data received from the ICP TX message. This step is coordinated by the Data Manager state machine in Prepare MPDU, which also triggers the TX after backoff request (refer to Section 3.1.2.3) to send the DATA frame.

3.1.2.3 MAC TX

The module MAC TX implements low-level latency critical MAC transmission functionality, i.e. the timing aligned provision of payload data and associated control information to TX PHY. Input to the module are MPDU blocks, which include the MAC header and frame body. These blocks are then extended by the FCS field to form a complete MPDU. For transmissions in 802.11ac format, the module generates A-MPDUs by adding delimiter fields and padding to the MPDUs. Version 1.1 of the 802.11 Application Framework only supports A-MPDUs with a single MPDU. The module supports backoff counting and handles Clear Channel Assessment (CCA) information provided from MAC RX. Furthermore, it ensures correct interframe spacing for the following scenarios:

SIFS before ACK packet transmission
DIFS/EIFS before Data packet transmission

The module implements two transmission types:

Transmissions after backoff—Transmission in normal slots using CCA evaluation and backoff counting, with application of DIFS or EIFS as needed.
Transmissions after SIFS—Transmission at the end of SIFS period without any CCA evaluation. It is, for instance, used for the ACK frame transmission.

As shown in Figure 27, the module consists of three major submodules:

Timing Control
Backoff
Data Pump

All three modules are described in the following sections in more detail. Notice that the overall internal structure of MAC TX follows the concept of the IEEE 802.11 SDL specifications. Refer to Section J.5 of [1] for more information about the SDL specifications.

Figure 27 MAC TX Block Diagram

The module Timing Control implements the actual generation of timing information (timing signals) for the transmission part. At startup, a regular slot timing pattern is generated. The pattern consists of two signals, which are also shown in Figure 27:

Signal slot M2 start—Provided to the module Backoff, refers to timing instant of CCA check. The name M2 start and other signal names described below are derived from the identifiers used in Figure 9-14 of [1].
Signal slot M2 end—Provided to the module Data Pump, refers to timing instant of issuing a packet transmission to TX PHY taking into account the processing delay of the TX PHY and RF components.

The Timing Control module consumes timing information from the MAC RX. For frames received with valid length information independent of the actual FCS result, MAC RX provides a reference to the frame end timing to MAC TX. Based on this information, the Timing Control determines the correct interframe spacing and generates timing signals for transmission after SIFS (signal slot M1 end) and transmissions in regular slots after DIFS or EIFS depending on whether the frame reception was completed with FCS pass or fail.

Figure 28 Interframe Timing Relationships

The module Backoff performs the backoff procedure as defined in [1] Section 9.3.3. The backoff counter is initialized with the desired backoff value, provided through the TX after backoff request. At the appropriate timing instant within relevant slots, which is indicated to the module with the slot M2 start signal, the CCA information is checked and if the channel is idle, the backoff counter is decremented. If the counter reaches zero, this condition is indicated via the signal backoff done to the module Data Pump.

The module Data Pump coordinates the actual data transmission functionality. The module accepts requests for transmission with or without application of the 802.11 backoff procedure, referred to as transmission after backoff and transmission after SIFS respectively. Those requests are processed based on the timing and backoff information described earlier in the section. The module adds the FCS field to the MPDU blocks to generate complete MPDUs. FCS calculation is done based on IEEE 32-bit CRC as specified in [1] Section 8.2.4.8. In the 802.11ac mode, the module generates A-MPDUs by adding delimiter fields and padding. The MPDUs or A-MPDUs together with associated control information are sent to the PHY SAP TX. The module also provides information about active transmissions to the Backoff module to ensure that this is taken into account during backoff counting. At a given instant of time, only one pending or active transmission request is allowed for transmission after backoff and one for transmission after SIFS. To enable higher level MAC entities to control the data flow to MAC TX accordingly, status information is provided by the MAC TX module.

In addition the functions described previously, the module MAC TX also provides statistics information. Version 1.1 of the 802.11 Application Framework provides the following statistics:

Number of TX after SIFS requests detected
Number of TX after SIFS requests completed
Number of TX after backoff requests detected
Number of TX after backoff requests completed

3.1.2.4 TX Bit Processing

The purpose of the TX Bit Processing module is to generate the signal fields and enqueue the PSDU into the data stream. This stream is then serialized, scrambled, encoded, punctured, and interleaved before it is passed to TX I/Q Processing. Its block diagram is shown in Figure 29. The types of the data path and the elements of the control path are listed in Table 10.

Figure 29 TX Bit Processing Block Diagram

Module	Output Data Type	Output Control Information
MAC TX	U8
TX PHY State Machine	U32	TX bit processing parameter length⁵ enable scrambler
Bit Serializer	Boolean	enable scrambler
Scrambler	Boolean
Convolutional Encoder	Boolean array (2 elements)
Puncturer	Boolean
Interleaver	Boolean	packet configuration

Table 10 TX Bit Processing Data Types and Control Information

The first module in TX Bit Processing is the TX PHY State Machine, which encodes the signal fields according to 18.3.4 of [1] (L-SIG), 22.3.8.3.3 of [2] (VHT-SIG-A), and 22.3.4.8 of [2] (VHT-SIG-B). The Data Generator VI furthermore turns PSDU into a sequence of SERVICE field, PSDU data, TAIL, and PADDING for the 802.11a format according to 18.3.5 in [1] or SERVICE field, PSDU data, PADDING, and TAIL for 802.11ac format according to 22.3.4.9 in [2] respectively. The Generator outputs are combined using the EDSC pattern (see Section 6.1.1). The TX Bit Processing has a head start of two OFDM symbols to have the first bits available when needed by the TX I/Q Processing.

Each signal field is generated in one burst. The data bits are generated as one continuous burst per OFDM symbol in dependence on N_DBPS. A small FIFO with a four-wire handshake ensures that bytes for at least one OFDM symbol are available. Furthermore TAIL and PADDING bits are also generated as part of the corresponding burst. So in worst case, the bit processing chain generates bits for up to two OFDM symbols in one burst, which is compensated in a FIFO of the TX IQ Processing Data Assembler.

The next downstream module is the Bit Serializer module, which converts data fields and PSDU data into one bit per cycle. A FIFO at module start ensures the module can process the incoming data rate. The maximal number of data bit per symbol N_DBPS is 720 (802.11ac, 40 MHz, MCS 9). Because PSDU data is given in bytes, the FIFO must store at least 90 samples.

After bit serialization scrambling, convolutional encoding, puncturing, and interleaving are applied as described in 18.3.5.5–18.3.5.7 of [1] and 22.3.10.4–22.3.10.8 of [2]. The scrambler and the encoder must be reset before the first bit of the data field is processed. The Scrambler module is bypassed for signal fields. The Puncturer module serializes the two stream of the convolutional encoder using the puncturing patterns of Figure 18-9 and 20-11 of [1]. A FIFO is used on the input of the module since the data rate is higher on the input. The Interleaver applies the BCC interleaver operations defined in section 18.3.5.7 of [1] and section 22.3.10.8 of [2]. The write operation into the memory is based on Equation 22-77 of [2], which applies the first permutation. The read operation is based on Equation 22-82 of [2], which applies the second permutation. Reading is started as soon as all bits of the current OFDM symbol are saved to memory. A double page memory is used, which enables reading and writing at the same time. The output of the Interleaver and the modulation scheme that is used is provided on the output of the TX Bit processing module.

Module	Output Timing
L-SIG Generator	U32 on start of L-SIG processing
VHT-SIG-A Generator	2 U24 on start of VHT-SIG-A processing burstwise
VHT-SIG-B Generator	U32 on start of VHT-SIG-B processing
Data Generator	N_DBPS/8 U32 / OFDM symbol burstwise
Bit Serializer	N_DBPS bits or array of bits / OFDM symbol burstwise
Scrambler
Convolutional Encoder
Puncturer	N_CBPS bits / OFDM symbol burstwise
Interleaver	N_CBPS bits / OFDM symbol burstwise

Table 11 TX Bit Processing Transfer Timing

The output timing of the submodules is given in Table 11. All submodules of the TX PHY state machine generate data on the asserted enable signal from the OFDM symbol trigger type module. The modules need up to two U32 words. Starting with the OFDM symbol for the data field, the Data Generator provides the required number of bytes. After the Bit Serializer, N_DBPS clock cycles are needed to complete the transfer. The convolutional encoder doubles the number of bits but due to the transfer of an array, the number of transfers is not changed. After puncturing, N_CBPS bits remain.

Figure 30 TX Bit Processing Latency

The latency of the bit processing chain depends on the format, bandwidth, and MCS. Similar to Table 11, Figure 30 refers to the two corner cases of Non-HT mode with MCS 0 and highest MCS at highest bandwidth. The latency is given for the first subcarrier of the packet. So this is the time the TX bit processing chain needs from start trigger until the first valid bit is provided. Most of the modules have a fixed latency.

The Interleaver must store bits from one complete OFDM symbol. The read operation starts as soon as the last value arrives. N_CBPS bits must be read before the last sample is available on the output of the Interleaver. An additional latency of 11 comes from the pipeline stages.

The latency of the FIFOs in Bit Serializer and Puncturer are unknown.

3.1.2.5 TX IQ Processing

The purpose of the TX IQ Processing module is to add the training fields and to convert the bits from TX Bit Processing into baseband I/Q samples. The OFDM symbol trigger of the RF loop is used to clock the generation of the OFDM symbols (see section 2.3.1). The block diagram is illustrated in Figure 31. The data types and control information are listed in Table 12.

Figure 31 TX IQ Processing Block Diagram

Module	Output Data Type	Output Control Information
TX Bit Processing	Boolean
Create Packet Structure		Field Map Subcarrier Timing TX IQ Processing Parameter
L-STF Assembler	CFX 3.13⁶
Assembler Modules	CFX 2.14
Channel Duplication	CFX 2.14
Channel Rotation	CFX 2.14
IFFT Prescale	CFX 0.16
Xilinx IFFT	CFX 3.13

Table 12 TX IQ Processing Data Types and Control Information

The Create Packet Structure modules create a timing structure, a field map, and the processing parameters that stay constant along the OFDM symbol. These parameters can include bandwidth, CP length, channel duplication, channel rotation, and tone scaling factor.

The field map controls which field is generated for the current OFDM symbol. Using the enable driven stream combiner (EDSC) pattern (see section 6.1.1) for field generation reduces the latency caused by parallel execution. There is one assembler for each training field, for the pilots, and one for the bit taken from TX Bit Processing module.

TX IQ Processing has to start delivery of IQ data as soon as possible after the TX start request is triggered. Because the IFFT takes about half an OFDM symbol (as described in Section 3.1.2.7), the L-STF is pregenerated in time domain, and its I/Q data is stored in a block RAM. For each combination of bandwidth and primary subband, a bank is reserved in the memory. Because L-STF is a Non-HT field, there is no need to distinguish between 802.11a and 802.11ac. For 802.11a, an additional bank for DC centered signal exists. Because L-STF is a repeating sequence in time domain with a period of 0.8 µs, you need to store only 64 samples.

The remaining training fields are generated according to the sections 22.3.4.3 (L-LTF), 22.3.4.6 (VHT-STF), and 22.3.4.9 (VHT-LTF) of [2].

The L-DATA and VHT-DATA are built from bits generated by TX Bit Processing modules. The bit stream is buffered in a FIFO that is laid out to buffer up to three OFDM symbols, which are the bit processing head start, the current OFDM symbol, and the last OFDM symbol, if filled with padding. Besides applying the correct modulation, the module also rotates VHT-SIG-A2 according to 22.3.8.3.3 in [2].

The pilot tones are inserted according to 22.3.10.10 of [2] using the information from the field map.

After the assembler modules, the channels are duplicated and rotated according to sections 22.3.4.x and 22.3.7.5 of [2]. Here the Create Packet Structure modules ensure correct settings for handling channel duplication and rotation.

Next downstream module is the IFFT prescale. This modules applies tone field scaling according to 22.3.7.4 of [2] to ensure the time domain power of VHT modulated fields does not exceed the time domain power of pre-VHT modulated fields (each summed over all transmit channel). The scaling factor is determined in the Create Packet Structure module and depends on bandwidth and field type (refer to table 22-8 of [2]).

The last module is the IFFT, which is a wrapper around the Xilinx FFT core. It contains a 256-point IFFT operation using “Radix 4, Burst I/O” architecture similar to the RX IQ processing. In addition, the configuration input is used to enable cyclic prefix insertion by the core. The core configuration settings, such as this input, are dynamic and are provided parallel to the first sample of each OFDM symbol. A small FIFO is placed before the FFT input to compensate the longer execution time due to guard interval GI2 of L-LTF-1. The FFT output is shifted in frequency using a toggling negation. The fixed point format on the output is CFX 3.13 based on the requirements of section 2.3.3.3.

Figure 32 TX IQ Processing Latency

There are two paths for latency in TX I/Q processing. One goes for the precalculated samples of the L-STF in time domain. This path has only a latency of five cycles, which allows the PHY to ensure a packet starts at the interface to the RF when it is triggered inside the PHY. The second path goes for all the other symbols that are created using the path shown in Figure 32. The FFT latency is smaller than reported by the Xilinx core because the reported value includes loading of all 256 samples. During the packet, the FFT executes and unloads samples in 616 clock cycles after the last sample arrived. The remaining clock cycles per OFDM symbol are used to transfer data from the input FIFO to the FFT core. By the time the last sample is available on the input, the FIFO is empty and it is passed to the core as fast as possible. The delay of the FIFO is unknown. All other modules have a fixed latency.

3.1.2.6 Timing

The overall timing of the TX chain including TX Bit processing and TX IQ Processing is shown in Figure 33 for packets in 802.11ac format. The timing works similarly for 802.11a packets. Time is represented on the horizontal axis. On the vertical axis only a couple of module outputs are chosen that are important or change the transfer timing. The colored rectangles correspond to the data values of one OFDM symbol. The size and the placement among the time axis are related to the latencies and transfer timings of the modules. The black arrows show important control signals inside the processing chain and between PHY and MAC. The arrows are based on the timing information.

Figure 33 TX PHY Timing for 802.11ac Packets

Figure 33 shows the timing of the transmitter for an 802.11ac packet. The RF transmission starts with the start of packet trigger that is given with the start of the first OFDM symbol. This is possible because the L-STF is unloaded from memory in time domain. The latency is only a few cycles. In parallel, the bit processing starts with the encoding of bits for the L-SIG and the I/Q processing starts assembling the L-LTF. The head start of four symbols for the bit processing and two symbols for the IQ processing is kept during the packet generation. This design ensures that all samples arrive in time for I/Q processing and RF.

Assembling the bits takes only a few cycles. Bit insertion when the code rate is applied in convolutional encoding and puncturing causes the chain length of valid samples to increase at the end of TX Bit processing. The samples leave the bit processing as a burst because the interleaver handles data in groups on N_CBPS.

The assemblers in RX IQ Processing module append training fields and add pilots to the fields generated by TX Bit Processing module. Each OFDM symbol will contain 256 I/Q samples. The IFFT transforms this I/Q data into time domain and adds guard interval. This transformation takes 360 cycles plus FIFO delay.

In case of an invalid TX Request, there is no packet generation at all and the PHY TX Request handler generates the TX end indication immediately.

3.1.3 RF

The digital downconversion (DDC) and digital upconversion (DUC) modules are based on the PXIe Streaming project templates of USRP and NI 579X. Their block diagrams are shown in Figure 34 and Figure 35. In the downconversion path, a DC Offset Correction module is present to estimate and compensate the residual DC offset from the RX LO. This module mitigates the impact to the autocorrelation computation within the Synchronization block. The DC Offset Correction module estimation uses an average over 512 samples. After each averaging windows the LSB of the correction value is increased or decreased. Over time, the correction value is approaching the DC offset iteratively.

Figure 34 DDC Block Diagram

Figure 35 DUC Block Diagram

The latencies of DDC and DUC are given in Figure 36 and Figure 37. The Fractional Decimator and Interpolator latencies depend on the ratio of clock rate versus sample rate. Since the clock rate is different between the RF devices USRP and FlexRIO, the DDC has a target-specific latency. The latency for the DUC remains the same for both target types.

Figure 36 DDC Latency

Figure 37 DUC Latency

The analog parts of the device and FPGA logic that are not presented on the block diagram add latency to the RF path. Those can be measured using RF loopback and the Streaming project templates for the specific target. The results are listed in Table 13.

	USRP RIO 40 MHz BW (Data clock = 120 MHz)	USRP RIO 120 MHz BW (Data clock = 200 MHz)	FlexRIO / FAM (Data clock = 130 MHz)
DDC	57 clock cycles ≈ 0.48 µs	72 clock cycles ≈ 0.36 µs	59 clock cycles ≈ 0.45 µs
DUC	40 clock cycles ≈ 0.33 µs	50 clock cycles ≈ 0.25 µs	40 clock cycles ≈ 0.31 µs
Others (ADC, DAC, … )	100 clock cycles ≈ 0.83 µs	50 clock cycles ≈ 0.25 µs	115 clock cycles ≈ 0.88 µs
RF round trip time	197 clock cycles ≈ 1.64 µs	172 clock cycles ≈ 0.86 µs	214 clock cycles ≈ 1.65 µs

Table 13 RF Latency

3.2 Host

The host is a sample application that covers all important features of the 802.11 Application Framework. This covers configuration of the FPGA target, exchanging payload data, and monitoring the system status.

3.2.1 Host Architecture

The host is split into six loops covering the jobs of configuration, data exchange, and status display (refer to Figure 38). The initialization of the system and the cleanup are done on the upper left and upper right of the block diagram respectively. The system status is passed around with the help of a session cluster that stores all handles to devices and queues used in the host application.

There are a few queues used to buffer and pass payload and status information around the system (refer to Table 14). All queues are part of the session cluster.

Queue	Purpose
stop	synchronize shutdown of all loops in case an error occurred or the stop button was pressed
send	buffer data that should be transmitted to the target
receive	buffer payload received from the target
receive throughput	store information about received payload size and timestamps (used for throughput graph display)

Table 14 Message Queues Used in Host Application

The loops for data transmission between target and host as well as between host and UDP ports run without throttling to achieve the maximal possible throughput. The loops for configuration and status display run every 100 ms to enable a responsive system. The loop to display events, constellation, channel estimation, and spectral plots runs at a slower rate (every 250 ms), as it consumes a lot of processing power.

Figure 38 Host Schematic Block Diagram

3.2.2 System Configuration

The parameters for system configuration splits up into three groups, and they are ordered on the front panel from top to bottom. Parameters that can only be changed at system start can be found on the upper right. Parameters that can only be changed while the station is off can be found above the gray line in each tab. Parameters that can be changed at any time can be found below these lines.

Refer to the HTML documentation included inside the project files tab [5] for a detailed description of each parameter.

3.2.3 AGC

The host offers an automatic gain control (AGC) mechanism to ensure that the operating point of the system keeps in an optimum range. The main building blocks are shown in Figure 39. Power measurement is done in the baseband as described in Section 3.1.1.2. Based on this power measurement, the AGC adjusts the RF gain to meet targeted ADC headroom of around -25 dBFS. This target headroom has been derived in Section 2.3.3. Figure 39 shows a schematic of the power measurement.

Figure 39 Schematic Power Measurement

The host mechanism does not offer a packet-by-packet adaption. Therefore the AGC is working based on the signal power at the packet start from the previous detected packet. If this value does not meet the target head room within ±1 dB range, the RX gain is adjusted accordingly in steps of 0.5 dB.

Performance

4.1 TX EVM

TX EVM is -35 dB or better for the peak TX power level settings as given in Table 15. Measurements have been taken using NI WLAN Analysis Soft Front Panel Rel. 14.0 and the NI 5644R VST.

	NI USRP-2942		NI USRP-2943		NI 5791
Frequency	Min.	Max.	Min.	Max.	Min.	Max.
2.45 GHz	-8 dBm	19 dBm	-6 dBm	21 dBm	-24 dBm	7 dBm
5.85 GHz	-	-	-18 dBm	10 dBm	-	-

Table 15 Minimum and Maximum Peak TX Power Level for EVM = -35 dB or Better

4.2 Minimum RX Sensitivity

Minimum RX sensitivity for MCS0 (BPSK1/2) and MCS7 (64QAM3/4) is as given in Table 16. The measurements follow the rule given in Section 18.3.10.2 of [1]:

The packet error ratio (PER) shall be 10% or less when the PSDU length is 1,000 octets.
The minimum input levels are measured at the antenna connector.

Measurements have been taken using NI WLAN Generation Soft Front Panel Rel. 14.0 and the NI 5644R VST.

	NI USRP-2942		NI USRP-2943		NI 5791
Frequency	MCS0	MCS7	MCS0	MCS7	MCS0	MCS7
2.45 GHz	-81 dBm	-71 dBm	-84 dBm	-72 dBm	-74 dBm	-65 dBm
5.85 GHz	-	-	-75 dBm	-67 dBm	-	-

Table 16 Minimum RX Sensitivity

4.3 Throughput

Throughput considering DATA-ACK sequence, backoff value of 15, and different packet sizes is given in Table 17.

	802.11a	802.11ac
	802.11a	VHT20	VHT40
Packet size (bytes)	MCS7	MCS8	MCS9
2304	30.9 Mb/s	34.3 Mb/s	45.8 Mb/s
4000	37.7 Mb/s	45.1 Mb/s	67.0 Mb/s

Table 17 Throughput for DATA-ACK Sequence in 802.11a, 802.11ac VHT20 and VHT40

Conclusion

The LabVIEW Communications 802.11 Application Framework provides a real-time 802.11 implementation running on NI SDR hardware. This framework enables you to focus on a specific area of research by utilizing the existing link and only making changes or additions where desired.

Because of the flexibility of LabVIEW and the modularity of the framework, you can easily exchange portions of the design for prototyping new algorithms for future wireless systems. In addition, LabVIEW’s native interface between the host and the FPGA means that the design can be partitioned to profit from the parallel execution on the FPGA as well as calculations on the host.

The 802.11 Application Framework provides a comprehensive set of features. The overall architecture described in Section 2 allows to extend the 802.11 Application Framework toward features such as support for MIMO, VHT 80 MHz bandwidth mode, 802.11p PHY extensions, DCF functionality (retransmissions, RTS/CTS), QoS support, or 802.11p MAC extensions.

This 802.11 Application Framework offers a variety of starting points for wireless research and prototyping.

Questions? Email us at labview.communications@ni.com.

APPENDIX

6.1 Design Pattern

6.1.1 EDSC (Enable Driven Stream Combiner)

The enable driven stream combiner is used to combine information from different sources into one stream. You can use the EDSC to map different fields into the frequency domain of one OFDM symbol. The idea is illustrated for 3 computational VIs in Figure 40.

Figure 40 Enable Driven Stream Combiner

There is a single Stream Generation VI that generates control information for the computational VIs. This comprises an enable signal for each computational VI. Only one of these enable signals is asserted at a given time. The asserted enable signal defines the structure of the stream that should be generated. Afterward, an unlimited number of computational VIs provide their data on the output whenever the corresponding enable signal is asserted. When the enable signal is not asserted, the output is zero. Because of this constraint, a simple OR gate can be used to combine the streams.

There is no throttle control mechanism in this design pattern. The assumption is that the computational module can always provide data when the enable signal is asserted. This has to be ensured by the stream generation before start.

If the computation of one of the VIs requires pipelining, the other paths between Stream Generation VI and the OR gate have to be delayed to equalize path latencies. Since the output of the computation usually has a wider bit width than the enable signal, it is recommended to add the delay before the computational modules.

6.2 Protocols

6.2.1 Event FIFO

The 802.11 Application Framework utilizes a target-to-host FIFO that is reserved for event messages, which is information that is passed between modules on the target. The message must be small in size. All information must fit into a 64-bit integer where eight bits are reserved for the event ID. These IDs have to be consecutive, unique numbers starting with zero.

Figure 41 Event Arbitration on FPGA

The arbitration of the FIFO events in one clock domain on the FPGA is illustrated in Figure 41. Since the timestamp is added in the Event to FIFO VI, the order of the events in the target to host FIFO is not related to the correct order of arrival on the FPGA. One event that occurs burstwise needs multiple arbitration rounds to get written to the FIFO. These multiple arbitration rounds increase the time difference between arrival and timestamp value.

There is an event specific conversion VI that creates a U64 integer from the event information and contains the condition for writing an event. If the condition is met, the U64 value is written to a local FIFO. This FIFO ensures that there is no loss of events because of the arbitration for the target-to-host FIFO. The conversion VI also contains the unique ID for this event. It is available on the output of this VI and is part of the U64 word.

The local FIFO reference and the event ID is provided to a common Event to FIFO VI that has to be instantiated once per event. It contains the arbitration logic for the target to host FIFO. The value of the round robin counter is checked against the given event ID. If a match is detected, two U64 words are written to the target to host FIFO. The first word contains the timestamp, and the second word is the event data read from the local FIFO reference. Since the event IDs are unique, there is only one instance of the Event to FIFO VI writing to the FIFO.

Round robin scheduling and the use of two words per event limits the event FIFO throughput. The rate for each event must be lower than twice the total number of events. Events can appear at a higher rate as long as the local FIFO can store those values.

Since the timestamp is added in the Event to FIFO VI, the order of the events in the target to host FIFO is not related to the correct order of arrival on the FPGA. One event that occurs burstwise needs multiple arbitration rounds to get written to the FIFO. Multiple arbitration rounds would increase the time difference between arrival and timestamp value.

6.2.2 ICP

ICP stands for Interprocess Communication Protocol. The Application Framework uses this protocol to transfer data from host to target and vice versa. The idea is to send messages of fixed structure but varying length over a channel that is able to transmit a stream of bytes, such as the DMA FIFOs. The design is very simple but open for future enhancements, such as multiplexing different structures in the same stream or recovering from FIFO overflow with the help of a sync word. In the current implementation all structures start with a length field of four bytes in network order encoding the length of the overall structure excluding the length field itself. The Application Framework uses two structures to transfer data to Target TX (Figure 42) and to send data received by the Target RX (Figure 43).

Figure 42 ICP TX Format

The ICP TX structure serializes the TX Request, where format and MCS are mapped to two bits, and MCS is mapped to 4 bits. Power Level is not used in the current design. Scrambler Seed must not be zero. The TX Request is followed by the MAC payload.

Figure 43 ICP RX Format

The ICP RX structure serializes the RX indication where the Decoding status contains the following bits:

FCS result CRC check succeeded
Header recognize done header recognize was finished
Header recognize error header contains a decodable frame subtype (current designs supports DATA and ACK)
Address matching the destination MAC address of the packet equals the device MAC address
Valid indication is valid

The RX indication is followed by the MAC payload (MPDU without header information).

Abbreviations

Abbreviation	Meaning
ACK	Acknowledgement
ADC	Analog Digital Converter
AGC	Automatic Gain Control
A-MPDU	Aggregated MPDU
BLER	Block Error Rate
CCA	Clear Channel Assessment
CW	Continuous Wave
DAC	Digital Analog Converter
DIFS	Distributed (coordination function) Interframe Space
DCF	Distributed Coordinate Function
EIFS	Extended Interframe Space
FAM	Frontend Adapter Module (RF module)
IFS	Inter Frame Spacing
NACK	Negative Acknowledgement
MAC	Medium Access Control Layer
MCS	Modulation and Coding Scheme
MPDU	MAC PDU
OFDM	Orthogonal Frequency-Division Multiplexing
PDU	Protocol Data Unit
PHY	Physical Layer
PLCP	PHY Layer Convergence Protocol
PN	Pseudo Noise
QAM	Quadrature Amplitude Modulation
RF	Radio Frequency
RX	Receive
SDL	Specification and Description Language
SDU	Service Data Unit
SIFS	Short Interframe Space
SISO	Single Input Single Output
TX	Transmit
UDP	User Datagram Protocol
VHT	Very High Throughput

Endnotes

¹Note comments on AGC implementation limitations in Section 1.4.

²The channel width VHT 80 MHz is not supported in version 1.1 of the Application Framework. However, at selected points the design is prepared for VHT 80 MHz support.

³Basic DCF support includes support CCA evaluation, timing control for standard compliant interframe spacing and support for fixed backoff wait time. Further functionalities need to be added, e.g. random exponential backoffs, re-transmission support, etc., for complete DCF support.

⁴This reference power level corresponds to the power level of a continuous wave (CW) signal having an amplitude of -3 dBFS at the ADC input.

⁵Defines actual length of U32 output

⁶Bypasses all following modules.

Bibliography

[1] IEEE, "Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications," 2012.

[2] IEEE, "Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications; Amendment 4: Enhancements for Very High Throughput for Operation in Bands below 6 GHz," 2013.

[3] IEEE, User guide for 802.11ac waveform generator, IEEE 802.11-11/0517r6, 2011.

[4] T. M. Schmidl and D. C. Cox, "Robust Frequency and Timing Synchronization for OFDM," IEEE Transactions on Communications, vol. 45, no. 12, pp. 1613-1621, 1997.

[5] "802.11 Application Framework," National Instruments, 2015.

Was this information helpful?

Yes

What do you need our team of experts to assist you with?

Request a quote Find the right product Place an order Get support on a product

How can we help?

Please enter your information below and we'll be intouch soon.

This field is required

Preferred communication method

Email Phone call