Abstract — This paper reports the first large-scale, dense sub-THz heterodyne array featuring: (1) compactness of units: two interleaved 4×4 arrays with λ/2 unit pitch are integrated in a 1.2-mm² space; (2) multi-functionality of circuits: each array unit simultaneously achieves LO generation, inter-unit synchronization, incident wave coupling and down-conversion; (3) global phase locking: a strongly-coupled 2D LO network is phase locked to a 75-MHz reference, enabling phase-coherent pairing with off-chip transmitters; (4) scalability: array scale increase and lower LO phase noise are facilitated by eliminating the global routing and power sharing of sub-THz LO; (5) parallelism: beam forming from the two sub-arrays can be processed concurrently. A chip prototype using bulk 65-nm CMOS technology is implemented with 980-mW DC power. Array-wide 240-GHz LO locking is achieved with a measured phase noise of -84 dBc/Hz (1-MHz offset). The measured sensitivity (BW=1 kHz) of a single unit is 38.8-pW. This is >6x improvement over prior sub-THz arrays with similar scale and density but without phase-detection capability.

Index Terms — heterodyne sub-THz receiver, high-density scalable array, compact electromagnetic design, CMOS

I. INTRODUCTION

Beam-steerable receivers are in high demand in radar and imaging systems. There is a growing interest in pushing their operation frequencies towards the THz band, so that multi-antenna configurations with narrow-beam response can be realized at chip scale. This, however, calls for disruptive changes in traditional THz receiver architectures including square-law self-mixing detector arrays [1]–[3] and active mixer arrays [4], [5]. The former scheme, despite potentially high sensor unit density and large aperture, is unable to detect phase for back-end analog/digital beam forming. The latter provides phase information and higher sensitivity, but current heterodyne-array architectures have very poor scalability for beam narrowing: only a few receiver units can be integrated at present. This is mainly due to a centralized LO generator which then distributes the LO power to a mixer array for LO coherence among units [5]. However, loss and mismatch of such high-frequency global routing increase rapidly as the array scales up. With larger array size, the LO power shared by each unit also decreases linearly: this, along with the inherently high phase noise of THz LO, leads to significant degradation of sensitivity.

In this paper, high scalability of heterodyne array is achieved by replacing the centralized LO generation/distribution with a de-centralized intra-unit LO generation and a 2D coupled LO network. One major challenge for this architecture is that, each unit should fit into a tight 2×2 area to suppress side lobes in beam forming: this makes the unit integration of mixer, local oscillator and antenna extremely difficult. In our 240-GHz heterodyne receiver, the above problem is addressed with highly compact units. This enables the integration of two interleaved 4×4 phase-locked sub-arrays in a 1.2-mm² (2λ×2λ) area, showing for the first time the feasibility of single-chip implementation of phase-sensitive sub-THz receiver array with high density and large scale.

II. OVERVIEW OF THE ARCHITECTURE

Fig. 1 shows the receiver architecture with an array of N=4×4 cells. Each cell occupies an area of (λRF/2)² (λRF=240 GHz) and is comprised of two identical units placed symmetrically. The core component of the unit is a self-oscillating harmonic mixer (SOHM), which simultaneously (1) generates high-power 120-GHz LO signal and (2) down-mixes the RF signal using the non-linearity of the oscillating transistors. The SOHM is connected to both an intra-unit slot antenna for RF receiving, as well as a CPW/slotline mesh for strong LO coupling with neighboring SOHMs. The ensemble of coupled units also behaves as the VCO of an on-chip PLL. This locks all LOs to an external clock (~75 MHz). At one edge of the array, a small portion of the LO power is diverted to a 120-GHz injection-locked frequency divider (ILFD), which then works with other PLL blocks to dynamically adjust the LO frequency/phase through a global control signal Vctrl.

Larger array no longer lowers the LO power of each unit, and in fact averages out the LO phase noise by 10log10(2N) times. Excellent scalability is hence achieved. Antenna patterns can be synthesized from the 32 IF signals. The upper halves of all cells form one 4×4 sub-array (pitch=1/2), while the bottom halves of the cells form another 4×4 sub-array. Two sub-arrays can be processed independently, enabling concurrent forming of two beams and doubling the scanning speed.

III. DESIGN OF SENSOR UNIT

An important trait of our sensor unit design is its compactness, which is achieved by the multi-functionality...
of transmission lines. Fig. 2(a) shows the equivalent circuit of the lower unit in a cell. It is coupled with its neighboring units through CPW lines along the left, right and top boundaries as well as slotlines along the bottom boundary. The SOHM oscillates at $f_0 \approx 120$ GHz, and the 2nd harmonic ($2f_0 \approx 240$ GHz) is used as LO. Topologically, the SOHM can be regarded as two self-feeding oscillators [6] coupled by a central slotline $TL_2$. First, we describe the operation of the oscillator. Quasi-TE mode of the signal in $TL_2$ forces the two oscillators to oscillate differentially, $TL_2$ and CPW line $TL_1$ (with a common impedance of 55 $\Omega$ and total length of 79° at $f_0$) are used to introduce positive feedback and adjust $\angle V_{\text{drain}}/V_{\text{gate}}$ to maximize power generation and mixer conversion gain. Slotline $TL_3$ (135° at $f_0$) and $TL_4$ (60° at $f_0$) are in shunt and together serve as the resonator of the oscillator\(^1\). The $2f_0$ waves across MOSFET drains and sources are, however, in-phase and thus evanescent in slotlines $TL_2$ and $TL_5$. Therefore, unlike the $f_0$ wave present in all transmission lines, the generated $2f_0$ LO signal is confined within the MOSFETs for efficient frequency mixing. In Section IV, we show that radiation at both $f_0$ and $2f_0$ from slotlines are canceled. Lastly, a varactor pair is used to tune $f_0$ ($\Delta f_0 = 1.2$ GHz) for PLL locking. The simulated DC power consumption of the oscillator pair is 43.2 mW.

Next, we describe how the oscillator can simultaneously down-convert the RF signal. $TL_4$ and $TL_5$ consist of a straight slot ($\sim 90^\circ$ at $2f_0$) and a meander slot ($\sim 30^\circ$ at $2f_0$, not shown in Fig. 2(a)). The straight slot pair behaves as dipole slot antenna at $f_{RF} \approx 2f_0$, while the meandering parts are used for antenna impedance matching. The received signal in common mode is injected into the MOSFET gates through $TL_1$ and $TL_1'$. Due to the high non-linearity under large-power drive at the gate, the RF signal is down-converted to $f_{IF} = f_{RF} - 2f_0$. The IF signal is extracted from the MOSFET drains, where a virtual ground for differential oscillation is formed, leaving the oscillator undisturbed. To prevent the excitation of substrate waves, a hemispheric silicon lens needs to attach to the chip back. From simulation, the peak directivity and efficiency of the antenna are 4.8 dBi and 40%, respectively. The simulated noise figure (NF) is 46.5 dB (flicker noise limited) at $f_{RF} = 5$ MHz and 19.3 dB at $f_{IF} = 100$ MHz, based on the simulated noise of $V_{IF}$ with 50-$\Omega$ load in Fig. 2(b).

IV. COUPLED SENSOR ARRAY

Strong coupling and synchronization of LOs in the heterodyne receiver array is critical. The E-field distribution of a cell at $f_0$, shown in Fig. 3(a), reveals how that is achieved. At the left/right boundaries between cells, wave synchronization is enforced by the coplanar-waveguide (CPW) mode. At the top/bottom boundaries, wave synchronization is enforced by the quasi-TE mode of the resonator slots shared by the cells. Inside each cell, the two receiver units are also strongly coupled through the horizontal CPW line in the center. As a result, all 32 units share identical LO phase.

\(^{1}\)In Fig. 2(a), all blue wires are virtual ground at $f_0$, hence are treated connected at $f_0$. $TL_2$ is only 6° long, so at $f_0$, $TL_3$ (capacitive) and $TL_4$ (inductive) are considered to be in shunt and form resonance.
Slotlines in general have high radiative loss; fortunately, in our design potential radiations at \( f_0 \) are mostly canceled due to the out-of-phase wave formation in horizontally adjacent slots (Fig. 3(a)). The generated LO signal at \( 2f_0 \), although mostly confined within the MOSFETs, may still leak to the dipole slot antennas through \( C_{gd} \) of the devices. But shown in Fig. 3(b), radiations of the \( 2f_0 \) leakage from two units of a cell are out-of-phase, hence suppressed again. The simulated peak antenna gain of the \( f_0 \) and \( 2f_0 \) signals are only -58 dBi and -24 dBi, respectively.

![Fig. 3. E-field distribution of (a) generated waves at \( f_0 \), (b) generated waves at \( 2f_0 \), and (c) incident waves \( f_{RF} \approx 2f_0 \).](image)

Fig. 3. E-field distribution of (a) generated waves at \( f_0 \), (b) generated waves at \( 2f_0 \), and (c) incident waves \( f_{RF} \approx 2f_0 \).

![Fig. 4. Simulated beam-steering patterns of a 4×4 sub-array.](image)

Fig. 4. Simulated beam-steering patterns of a 4×4 sub-array in (a) the E-plane and (b) the H-plane.

Fig. 3(c) shows the field distribution for the incident RF signal \( f_{RF} \approx 2f_0 \). Since the signal is from one far-field source, the directions of the E-field in the two units of a cell are the same, which is contrary to the above case for the \( 2f_0 \) LO. Therefore, the two down-converted IF signals are out-of-phase. For each 4×4 sub-array, IF signals can be used for beam-forming with back-end processing of their relative phases and amplitudes. Fig. 4 gives the simulated antenna patterns synthesized to point to various directions.

V. ON-CHIP LO PHASE LOCKING

The right part of Fig. 1 shows the array-ILFD interface at the bottom slot edge of the array. An additional parallel slotline is built to magnetically couple out the 120-GHz LO signal. Impedance mismatch in the coupling is intentionally introduced, so that only 100 µW of LO power is injected to the switching transistor \( M_1 \) of the ILFD. This is to avoid significant disturbance to the oscillation and symmetry of the LO array. The simulated ILFD locking range is 4.2 GHz. The output signal of ILFD at \( f_0/4 \approx 30 \) GHz is delivered to other digital dividers (±400). All other on-chip PLL components are standard.

VI. MEASUREMENT RESULTS

The chip is fabricated using a 65-nm CMOS technology. Fig. 5 shows the die photo and packaging details. The 32-receiver-unit array alone occupies only 1.2-mm\(^2\) area. The measured DC power of the entire chip is 0.98 W.

![Fig. 5. (a) The chip micrograph, and (b) the PCB with packaging.](image)

![Fig. 6. Spectrum of (a) the output from on-chip divider chain (locked at \( f_{LO}/3200 \)) with RBW = 10 Hz, and (b) the IF of a unit (Row 1, Col. 2) with RBW = 100 Hz.](image)

![Fig. 7. Setups for testing (a) IF signal, (b) leaked LO at \( 2f_0 \).](image)

Fig. 6. Spectrum of (a) the output from on-chip divider chain (locked at \( f_{LO}/3200 \)) with RBW = 10 Hz, and (b) the IF of a unit (Row 1, Col. 2) with RBW = 100 Hz.

![Fig. 7. Setups for testing (a) IF signal, (b) leaked LO at \( 2f_0 \).](image)

Fig. 7. Setups for testing (a) IF signal, (b) leaked LO at \( 2f_0 \).

Measurement of the divider-chain output (at \( f_{LO}/3200 \), spectrum shown in Fig. 6(a)) in the PLL indicates that the PLL locking range for \( f_0 \) is 116.48 to 117.44 GHz (i.e. 232.96 to 234.88 GHz for \( f_{LO} \)). Next, Fig. 7(a) shows the setup for IF signal measurement using a VDI WR-3.4 VNA frequency extender as the radiation source, with a total radiated power of -7.1 dBm (calibrated by a PM5 power-meter) and an antenna gain of 24 dBi. From a far-field distance of 10 cm, the radiation is coupled into the CMOS chip through a hemispheric silicon lens (1-cm diameter) attached to the chip back. Two synchronized signal generators (Keysight E8257D and HP 83732B) are used to provide the input reference signals of the chip and the radiation source. Chip output IF signals are multiplexed and amplified (50-Ω input and calibrated gain of 49 dB) off-chip. Radiation frequency \( f_{RF} \) is tuned so that \( f_{IF} \) is...
at 4.6 MHz. All measured IF signals (with highest $P_{IF}$ of -32.3 dBm) are also locked to 4.6 MHz and exhibit similar spectral profile shown in Fig. 6(b). When the reference $f_{LO}/3200$ is shifted, the same $f_{IF}$ of all pixels shifts to the new expected value, which indicates array-wide locking. When the radiation source is off, the amplified IF noise floor is -95 dBm/Hz, hence -144 dBm/Hz at chip output, matching well with the simulated noise PSD in Fig. 2(b).

The receiving antenna pattern of a sensor unit is measured by rotating the chip. Results are shown in Fig. 8. Measured peak directivity $D_{RX}$ is 5.5 dBi; end-fire responses are lower than simulations, because the radiation is blocked by the lens fixture. According to the measured $D_{RX}$, the power injected into a unit is -41.4 dBm, resulting in an estimated receiver conversion loss (amplifier gain de-embedded) of 39.9 dB and a NF of 42.6 dB at 100-MHz IF (due to the limited bandwidth of the multiplexer, for noise PSD at 100 MHz, results in Fig. 2(b) are used). The discrepancy between the simulated and measured NF may due to the underestimated loss at chip-lens interface and the overestimated oscillation power. The sensitivity with 4.6-MHz IF and a practical 1-kHz bandwidth is 38.8 pW$^2$.

Fig. 8. The sensor antenna pattern in (a) E-plane and (b) H-plane.

Fig. 9. Measured (a) spectrum and (b) phase noise of LO at 2$f_0$.

As mentioned in Section II, the coupling among 32 units lowers the LO phase noise of a single unit by $10\log_{10} 32 = 15$ dB. Due to non-ideal radiation cancellation, there are weak radiation leakage of LO at 2$f_0$, which is detected by a VDI extender receiver (Fig. 7(b)). The down-converted spectrum of the 240-GHz LO is shown in Fig. 9(a), with a phase noise (Fig. 9(b)) of -84 dBc/Hz ($\Delta f=1$ MHz). This is 22 dB lower than the LO in [5] normalized to 240 GHz.

VII. CONCLUSION

By comparing with other sensors at similar frequencies (TABLE I), it can be seen that this work improves the sensitivity of multi-cell heterodyne receiver [5] by $\sim2x$ (additional $\sim500x$ improvement expected with >100-MHz IF). More importantly, our array, equipped with highly versatile array units, achieves $\frac{1}{2}$-pitch density and large array scale that fully match those of homodyne sensor arrays (but with $>6x$ higher sensitivity and phase detection) for the first time, paving the way for ultra-high-resolution beam forming at sub-THz and potentially THz frequencies.

ACKNOWLEDGMENT

This work is funded by TSMC and by Singapore-MIT Alliance for Research and Technology (SMART).

REFERENCES


TABLE I. COMPARISON OF SUB-THZ RECEIVERS

<table>
<thead>
<tr>
<th>References</th>
<th>This Work</th>
<th>[5]</th>
<th>[1]</th>
<th>[2]</th>
<th>[3]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Detection Method</td>
<td>Heterodyne</td>
<td>Square-Law Detector</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Array Size</td>
<td>4x8</td>
<td>8</td>
<td>4x4</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Array Scalability?</td>
<td>Yes</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>RF Freq. (GHz)</td>
<td>240</td>
<td>320</td>
<td>280</td>
<td>320</td>
<td>280</td>
</tr>
<tr>
<td>Sensitivity (pW)</td>
<td>38.8* (0.072*)</td>
<td>71.4</td>
<td>917</td>
<td>1080</td>
<td>250</td>
</tr>
<tr>
<td>DC Power (mW)</td>
<td>980</td>
<td>117</td>
<td>6</td>
<td>38</td>
<td>180</td>
</tr>
<tr>
<td>Chip Area (mm$^2$)</td>
<td>2.80</td>
<td>3.06</td>
<td>5.76</td>
<td>6.76</td>
<td>6.25</td>
</tr>
<tr>
<td>Technology</td>
<td>65nm CMOS</td>
<td>130nm SiGe</td>
<td>130nm CMOS</td>
<td>180nm SiGe</td>
<td>130nm SiGe</td>
</tr>
</tbody>
</table>

* Received $P_{RF}$ to get unity SNR (BW=1 kHz). $^*$ A silicon lens is used in the measurement.

$^1$ Projected based on the measured conversion loss and the simulated white noise at IF=100 MHz.