14.2 A 0.55V 16Mb/s 1.6mW Non-Coherent IR-UWB Digital Baseband with ±1ns Synchronization Accuracy

Patrick P. Mercier, Manish Bhardwaj, Denis C. Daly, Anantha P. Chandrakasan

Massachusetts Institute of Technology, Cambridge, MA

IR-UWB radios are finding increasing use in low data rate sensing applications, in part because they can be easily duty-cycled to achieve extreme energy efficiency. Within pulsed radios, non-coherent (NC) RF front ends that use simple square-and-integrate samplers offer significant energy-per-bit savings over their coherent counterparts [1]. However, such samplers lose phase information and accumulate squared noise over the integration period. While this increases the SNR required to relay a bit reliably, the greater challenge is achieving signal synchronization. Telemetry applications often have small payloads (10 to 100 bits) where synchronization time dominates. Furthermore, synchronization performance is being continually pushed to enable positioning capability. Hence, the ultimate advantage of NC receivers relies on their ability to synchronize efficiently.

Previously published NC solutions are deficient in three areas: synchronization algorithms, codes, and their reliance on high resolution clocks. Current algorithms disregard the effects of squaring noise, and use matched filters (MFs) [2] that are guaranteed to be optimum only when noise is additive, such as the case in coherent receivers [3,4]. In NC systems, the use of MFs not only increases synchronization time, but makes the system very sensitive to errors in estimating SNR. Repetition codes can simplify algorithms by eliminating the need for MFs, but require parallel or sliding integrators in addition to higher SNR or on-time [5]. Synchronization accuracy is also limited by clock speeds with these techniques, as a ±1ns resolution (corresponding to 30cm of positioning accuracy) typically requires a 500MHz clock.

The proposed digital baseband overcomes these shortcomings without any increase in RF front end power or complexity via the following techniques: 1) new synchronization codes that require 11× fewer samples than repetition codes, and which allow high synchronization accuracy (1ns) using slow clocks (32MHz); 2) a new quadratic correlation algorithm that requires 2 to 4dB lower SNR than MFs and is robust to parameter measurement uncertainties; 3) an algorithmic transformation that reduces computational complexity of correlations by up to 32×; and, 4) a low-voltage, highly-parallel VLSI implementation that offers low-latency synchronization. As a result of these techniques, the baseband achieves a peak synchronization accuracy of ±1ns at a SNR of 4dB within 16µs.

The packet structure is shown in Fig. 14.2.1. The preamble consists of repetitions of a PN code (S0) followed by a start-frame-delimiter (SFD), header, and payload bits. State-of-the-art codes for NC systems, as proposed in the IEEE 802.15.4a standard [6], are not necessarily alias-free, i.e. two chip period shifts of S0 produce an identical sequence of integrator outputs. Figure 14.2.1 demonstrates this with a toy, 4a-like code. To guarantee no aliasing, the receiver must be able to shift integration slots by a chip period of 1.95ns (or integrate over 1.95ns windows). A simple modification in pulse positions bestows the alias-free property, as Fig. 14.2.1 illustrates. The baseband uses alias-free codes of length 512 chips (0.998µs) with a 31.2ns integration period.

The top-level architecture of the digital baseband is shown in Fig. 14.2.2. Every sample duration corresponds to 16 possible signal starts, or phases, separated by 1.95ns. For a code (S0) with n samples, both the number of sample shifts and the phase shift within a sample must be determined, requiring a total of 16n correlations. These correlations are distributed over 16 phase-correlation tiles (PCTs). Each PCT, as shown in Fig. 14.2.3, consists of 8 parallel quadratic correlators (QCORRs) which perform correlations of length n.

To better approximate an ideal maximum likelihood (ML) synchronizer, each term in the quadratic correlation requires two multiply-accumulate operations, one each with the averaged sample and its square. To reduce complexity, the quadratic correlation expression is re-factored so that correlations can be run once – with the mean and mean squared ADC samples – rather than with every received code, saving up to 32× in total synchronization energy.

Since the QCORRs units, shown in Fig. 14.2.4, perform correlations with an identical, but shifted set of coefficients, an efficient pipeline schedule can further reduce hardware and energy costs. Every cycle, a common pair of linear and quadratic coefficients is broadcast simultaneously to all 8 QCORRs, while averaged data is offset between QCORRs using a global circular buffer as shown in Fig. 14.2.3. Sharing coefficients, as opposed to data, reduces multiplexing costs of non-local coefficient fetches by 8×. Since n may be as large as 32, the 8 QCORR units in a PCT perform up to 4 sets of correlations. After all computations are complete, the inferred phase is conveyed to the RF front-end using a DLL, and the inferred shift is used to skip the right number of samples to achieve codeword alignment. Detection is treated like synchronization except that it is sufficient to use only 2 out of 16 PCTs. Custom, fine-grain, clock gating is employed during detection to reduce clock tree power of the unused PCTs. Overall, clock gating reduces dynamic power by 2.7× during idle mode.

The repeated S0 codes are followed by the SFD (S1, S2, S3, S4, S5) where S0 is an all-zero code with the same length as S1. The baseband searches for the SFD by correlating with all expected length 5 code sequences stored in a codebook. This is achieved via an inner quadratic correlation with S0 and S1, and an outer linear correlation that accumulates inner results for all sequences in the codebook. Inner results are stored in a serial shift register (Fig. 14.2.4). Two pairs of QCORRs are used in a time-interleaved fashion to avoid throughput loss or extensive buffering. The search continues until the SFD has the highest correlation, or a pre-specified timeout is reached. In the high SNR regime, the circuit can also be programmed to search for a length 1 SFD to reduce preamble duration.

Unlike conventionally used linear matched filters, the proposed quadratic correlation receiver achieves near-optimum performance, and is robust to SNR estimation errors. The baseband achieves a ±1ns synchronization accuracy with 10dB lower SNR, or 11× shorter preambles, compared with repetition codes, and is within 1.5dB of the optimum ML receiver, as shown in Fig. 14.2.5. In the presence of SNR estimation errors, the baseband improves the ‘eye’ opening by 4dB compared with an ideal matched filter.

Implemented in 90nm CMOS, the digital baseband occupies 2.55mm2 on an SoC with an integrated RF front end and 5b ADC [7]. The baseband operates at a core supply voltage as low as 0.55V. During detection and synchronization correlations, the baseband dynamically duty-cycles the RF front end to reduce system power. At a clock frequency of 32MHz, the chip can process an entire preamble in a minimum of 14µs and consumes an average of 1.6mW. A summary of results is shown in Fig. 14.2.6. A die photo is shown in Fig. 14.2.7.

Acknowledgments:
This work is funded by DARPA HI-MEMS program (Contract # FA8650-07-C-7704). The authors thank STMicroelectronics for chip fabrication and Nathan Ickes for testing support.

References:
Figure 14.2.1: Non-coherent UWB block diagram, with illustrated packet structure and alias-free codes.

Figure 14.2.3: Phase-correlation tile with 8 QCORRs.

Figure 14.2.5: Synchronization error rates (SERs) for ±1ns accuracy with ideal and mismatched SNRs.
Figure 14.2.7: Die micrograph of the IR-UWB digital baseband processor.