1. Detailed Description
The classical Costas loop that is suitable for BPSK/QPSK demodulation is shown in the Figure 1. The system involves two parallel tracking loops operating simultaneously from the same VCO (Voltage-Controlled Oscillator) or NCO (Numerically-Controlled Oscillator). The first loop, called the in-phase loop (or I arm), uses the VCO as in a PLL (Phase Locked Loop), and the second, called the quadrature loop (or Q arm) uses a 90 degree shifted VCO. The I and Q mixer outputs are filtered by single pole Butterworth low pass filters. The I and Q arm filter outputs are multiplied together and the product is scaled and filtered to produce the loop error used to control the VCO. The loop error should settle to a value when the loop is locked. A negative loop error decreases the VCO increment resulting in a lower VCO frequency, and similarly, a positive loop error increases the VCO increment resulting in a higher VCO frequency. The low pass filters in each arm must be wide enough to pass the data modulation without distortion.
Figure 1: Costas Loop Block Diagram
The input to the Costas loop is the waveform written as
where m(t) is the BPSK modulation and n(t) is a white bandpass noise. The in-phase mixer generates
while the quadrature mixer generates
where the mixer noise nmc(t) and nms(t) are low pass demodulated noise processes in the carrier noise n(t). The output of the multiplier is then
where nsq(t) represents all the signal and noise cross-products. The multiplier of the Costas loop can be thought of as allowing the bit polarity of the in-phase loop to correct the phase error orientation of the tracking loop, thereby removing the modulation. When the phase error ψe(t) is small, the Costas loop has the equivalent linear model in Figure 2.
2. Implementation

Figure 2: BPSK Costas Equivalent Loop Model
In above figure, Kc is the closed loop gain, which can be expressed as
where gc is defined as follows
and where ωc is the cross-over frequency, Kv is the gain of VCO, and F(s) is the transfer function of the loop filter, which is expressed in the following equation
where T is the sampling interval. The transfer function of the VCO is Kv/s .
3. Power Detect and Lock Detect
An interesting property of Costas loops is that the loop generates signals that can be used for other auxiliary purpose as well. This can be seen by reexamining the in-phase mixed signal and noting the following:
This shows that when the loop is locked, the in-phase arm produces an output proportional to the input data. Hence the data can be demodulated directly within the Costas loop after phase lock occurs.
Squaring, low pass filtering, and subtracting the arm voltages produces an output that indicates phase lock [cos(2ψe)- >1 as ψe->0] and can therefore serve as a lock detector. When ψe = 0, this generates an output proportional to the average signal power, which can also be used for automatic gain control.
This produces a measurement of the total RF input power, and therefore can be used for RF power control. The modified Costas loop is shown in Figure 3.
Due to continuing advances in high-speed digital technology, digital implementations of the Costas loop are becoming increasingly attractive. Advantages of digital implementations include their relative insensitivity to temperature variations and aging. More importantly, however, is the unique advantage that the loop design parameters, such as loop gain and loop filter time constant, can be programmable. The low pass filters, HI(s) and HQ(s) each have the same format consisting of a single pole, and this can be expressed as
where ωa is the cut-off frequency, H(0) is equal to (tan(ωaT/2)+1)/tan(ωaT/2). The signal flow graph for this first order system is shown in Figure 4.

Figure 4: First Order Low Pass Filter Model
In the above figure, b is calculated as follows
The signal flow graph of the first order loop filter is shown in Figure 5,

Figure 5: First order Loop Filter Model
In the above figure, Kp is the proportional gain which equals gc, and Ki is the integral gain, which is can be expressed as
where ωz is the zero frequency of the loop filter.
5. Traditional Design Method
The design of the Costas algorithm includes compromises between algorithm complexity and performance objectives. Typically the designer sketches out a signal flow graph of the algorithm using “Black Boxes” to represent signal processing operations. The computational requirements can be estimated from the signal flow graph by counting the number of multiplies, multiply-accumulate, and additions. A block diagram of the system can then be drawn out. Once the algorithm has been worked out on paper, a simulation program may be written to verify that the concept is correct. The simulation has quite often been written in the past using high level language such as C and FORTRAN. Unless the designer is also highly skilled in programming, the simulation software can require a long time to write and debug. This is because errors in the algorithm can be mistaken for programming errors or vice versa. Analyzing algorithmic trade-offs is more difficult because the software must be modified and debugged while changing or modifying subsystems. Viewing and analyzing the results of a simulation typically requires the use of a different software package or writing and debugging special display programs. And finally, testing can also prove to be very time consuming.
6. Designing with Hypersignal Block Diagram/RIDE
In order to reduce design time and to make the design portable, it would be advantageous to use a tool that would allow the designer not only to document algorithms in a signal flow graph format, but also to automatically generate a simulation from that documentation. The tasks of signal generation, viewing, manipulation, and verification also need to be addressed. These are precisely the needs that are fulfilled by the Hypersignal line of visual DSP software development tools.
Hypersignal Block Diagram/RIDE offers many advantages over other design methods. Designs are visually entered into the Block Diagram as signal flow diagrams. It is very easy to understand the function and operation of an algorithm when it is shown in this form and this especially valuable when the designer is not personally implementing the algorithm on the target hardware.
Our example is implemented with the following parameters:
Sampling frequency: 106666.66 Hz
In-phase low pass filter cut-off frequency : 12000 Hz
Quadrature low pass filter cut-off frequency: 24000 Hz
Lock low pass filter cutoff frequency: 500 Hz
Loop filter cross-over frequency(Acq): 1000 Hz
Loop filter zero frequency(Acq): 500 Hz
Loop filter cross-over frequency(Track): 125 Hz
Loop filter zero frequency(Track): 25 Hz
Lock threshold value: 9950
Carrier frequency: 27000 Hz
NCO higher frequency limit: 27666.66 Hz
NCO lower frequency limit: 25666.66 Hz
NCO maximum input value: 2000
NCO minimum input value: -2000
Based on the algorithm described in above section, we calculated the following coefficients:
The in-phase (I) arm filter coefficient is: bI = 0.461006
The quadrature (Q) arm filter coefficient is: bQ = 0.0787017
The lock detector low pass filter cofficient is: b = 0.970900
The loop filter coefficients for acquisition are: Ki = 1.123229, Kp = 38.136880
The loop filter coefficients for tracking are: Ki = 0.140402, Kp = 4.767110
The NCO gain is: Kv = 0.500000
Figure 6 presents a worksheet of the Costas loop, which implements the diagram in Figure 3.

Figure 6: Costas Loop Block Diagram Worksheet
In the worksheet, the carrier input signal is a BPSK signal generated by a hierarchy BPSK block which is shown in Figure 7, the NCO block performs the VCO function, and two multiply blocks are used as in-phase mixer and quadrature mixer to perform the phase detector function. The I LPF block performs an in-phase low pass filter. The Q LPF block performs the quadrature low pass filter. The Error mixer is performed by another multiply block. The hierarchy DLOOPF block shown in Figure 8 performs the dynamic loop filtering (the bandwidth narrows when the transition from acquisition to tracking mode is made) of the first order. The Recursion block performs the feedback and initially forces the NCO block input data ready. The power and lock detects are performed by the hierarchy Lock block shown in Figure 9. The worksheet is processed sample by sample, so the Buffer blocks are needed to temporarily store samples for later display. The top display shows the demodulated data, the middle display shows the Costas loop error, and the bottom display shows the lock detector output. A Text Display is used to show the instantaneous frequency of NCO.
Note that Costas loop error is in a transient state during carrier acquisition and then stabilizes when the loop is phaselocked. The effect of the phase-locking is illustrated by the data demodulated from the I arm.

Figure 7: BPSK Hierarchy Block Figure 8: Dynamic Loop Filter
Figure 7 shows the structure of the hierarchy BPSK block, which consists of a Square wave generator, a Sine wave generator, a Mixer block to perform multiplication of two input waveforms, and a Buffer block used to store data samples for display. While the data display shows the BPSK carrier. As shown in Figure 8, the hierarchy block for dynamic loop filter includes an Acquisition Loop Filter, Tracking Loop Filter, Threshold, and 2 to 1 Multiplexer blocks. It has two input connections and two output connections. The top input connection is from Error Mixer block, and the bottom input connection is from Lock Detect block (bottom output connection). The top output connection is for the filtered Costas loop error, and bottom output connection is for loop filter status checking. When the lock detect value is less than the threshold value, the Acquisition Loop Filter is used, and status is 0, otherwise, the Tracking Loop Filter is used, and the status is 1.

Figure 9: Lock Hierarchy Block
Figure 9 shows the structure of the hierarchy Lock block consisting of Square blocks, Low Pass Filters blocks, Add and Subtract blocks. It has two input connections and two output connections. The top input connection is from the inphase low pass filter, and the bottom input connection is from the quadrature low pass filter. The top output connection is used for power detect, and the bottom output connection is used for lock detect.
7. Code Generation
Our final step here generates an ANSI C code representation of the Block Diagram worksheet. Figure 10 shows the code generator output screen and Figure 11 shows the project view of the Costas Loop hierarchy. The code produced is easily cross-compiled for use on the target architecture.
Some of the applications include carrier recovery for modulation/demodulation communications systems.



