🤖 AI Summary
This work addresses the challenges of poor scalability and slow convergence in probabilistic computing hardware for solving the highly connected NP-hard problem of MIMO detection. The authors propose a graph sparsification-based replicated-variable approach combined with a two-dimensional parallel tempering (2D-PT) algorithm, implemented as a fully on-chip parallel p-bit solver on an FPGA. By enabling simultaneous parallel replica exchanges across both temperature and constraint dimensions, the architecture achieves significantly accelerated convergence without requiring manual parameter tuning. Evaluated on a 128-node system, the solver achieves an end-to-end latency of only 4.7 ms with a bit error rate outperforming conventional linear detectors. Projected 7 nm ASIC implementation estimates indicate power consumption below 200 mW at an operating frequency of approximately 90 MHz, offering an efficient and scalable hardware solution for dense combinatorial optimization problems.
📝 Abstract
Probabilistic computers built from p-bits offer a promising path for combinatorial optimization, but the dense connectivity required by real-world problems scales poorly in hardware. Here, we address this through graph sparsification with auxiliary copy variables and demonstrate a fully on-chip parallel tempering solver on an FPGA. Targeting MIMO detection, a dense, NP-hard problem central to wireless communications, we fit 15 temperature replicas of a 128-node sparsified system (1,920 p-bits) entirely on-chip and achieve bit error rates significantly below conventional linear detectors. We report complete end-to-end solution times of 4.7 ms per instance, with all loading, sampling, readout, and verification overheads included. ASIC projections in 7 nm technology indicate about 90 MHz operation with less than 200 mW power dissipation, suggesting that massive parallelism across multiple chips could approach the throughput demands of next-generation wireless systems. However, sparsification introduces sensitivity to the copy-constraint strength. Employing Two-Dimensional Parallel Tempering (2D-PT), which exchanges replicas across both temperature and constraint dimensions, we demonstrate over 10X faster convergence without manual parameter tuning. These results establish an on-chip p-bit architecture and a scalable algorithmic framework for dense combinatorial optimization.