Real-Time and Scalable Zak-OTFS Receiver Processing on GPUs

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational complexity of Orthogonal Time Frequency Space (OTFS) modulation in delay–Doppler domain processing, which hinders real-time implementation despite its robustness in high-mobility scenarios. The authors propose a hardware–algorithm co-designed GPU-accelerated Zak-OTFS receiver that exploits the structured sparsity of delay–Doppler channels. By integrating compact matrix operations, branchless iterative equalization, and a compute-aware architecture, the design substantially reduces both computational and memory overheads. The implementation achieves a throughput of 906.52 Mbps on a 16384×32 grid with 16QAM and 245.76 MHz bandwidth, meeting real-time processing deadlines at the 99.9th percentile latency. Extensive evaluations across multiple platforms demonstrate excellent scalability and robustness.
📝 Abstract
Orthogonal time frequency space (OTFS) modulation offers superior robustness to high-mobility channels compared to conventional orthogonal frequency-division multiplexing (OFDM) waveforms. However, its explicit delay-Doppler (DD) domain representation incurs substantial signal processing complexity, especially with increased DD domain grid sizes. To address this challenge, we present a scalable, real-time Zak-OTFS receiver architecture on GPUs through hardware--algorithm co-design that exploits DD-domain channel sparsity. Our design leverages compact matrix operations for key processing stages, a branchless iterative equalizer, and a structured sparse channel matrix of the DD domain channel matrix to significantly reduce computational and memory overhead. These optimizations enable low-latency processing that consistently meets the 99.9-th percentile real-time processing deadline. The proposed system achieves up to 906.52 Mbps throughput with a DD grid size of (16384,32) using 16QAM modulation over 245.76 MHz bandwidth. Extensive evaluations under a Vehicular-A channel model demonstrate strong scalability and robust performance across CPU (Intel Xeon) and multiple GPU platforms (NVIDIA Jetson Orin, RTX 6000 Ada, A100, and H200), highlighting the effectiveness of compute-aware Zak-OTFS receiver design for next-generation (NextG) high-mobility communication systems.
Problem

Research questions and friction points this paper is trying to address.

OTFS
delay-Doppler domain
signal processing complexity
real-time processing
high-mobility channels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zak-OTFS
delay-Doppler domain
GPU acceleration
structured sparsity
real-time processing
🔎 Similar Papers
No similar papers found.
J
Junyao Zheng
Department of Electrical and Computer Engineering, Duke University, NC 27708, USA
C
Chung-Hsuan Tung
Department of Electrical and Computer Engineering, Duke University, NC 27708, USA
Y
Yuncheng Yao
Department of Computer Science, Duke University, NC 27708, USA
Nishant Mehrotra
Nishant Mehrotra
Duke University
joint sensing and communicationmillimeter-wave sensingwireless systemsinformation theory
S
Sandesh Mattu
Department of Electrical and Computer Engineering, Duke University, NC 27708, USA
Zhenzhou Qi
Zhenzhou Qi
Duke University
vRANHeterogeneous ComputingWireless Network SystemComputer Network System
Danyang Zhuo
Danyang Zhuo
Duke University
Distributed SystemsNetworkingOperating Systems
R
Robert Calderbank
Department of Electrical and Computer Engineering, Duke University, NC 27708, USA
Tingjun Chen
Tingjun Chen
Nortel Networks Assistant Professor of Electrical and Computer Engineering, Duke University
Wireless NetworksOptical NetworksMobile ComputingIoTTestbeds