🤖 AI Summary
This work addresses the scalability limitations of large-scale Ising problem solvers, which are hindered by the physical constraints of analog solvers and the high latency introduced by conventional CPU-based decomposition. The authors propose a tightly coupled heterogeneous architecture that, for the first time, offloads the Ising problem decomposition task onto an FPGA, operating in concert with a custom 28nm analog Ising solver. Leveraging the FPGA’s reconfigurable parallel processing capabilities, the system substantially reduces communication latency and achieves efficient hardware-software co-design. Compared to an optimized CPU-based software baseline, the proposed system demonstrates nearly a 2× speedup and improves energy efficiency by over two orders of magnitude.
📝 Abstract
Emerging analog computing substrates, such as oscillator-based Ising machines, offer rapid convergence times for combinatorial optimization but often suffer from limited scalability due to physical implementation constraints. To tackle real-world problems involving thousands of variables, problem decomposition is required; however, performing this step on standard CPUs introduces significant latency, preventing the high-speed solver from operating at full capacity. This work presents a heterogeneous system that offloads the decomposition workload to an FPGA, tightly integrated with a custom 28nm Ising solver. By migrating the decomposition logic to reconfigurable hardware and utilizing parallel processing elements, the system minimizes the communication latency typically associated with host-device interactions. Our evaluation demonstrates that this co-design approach effectively bridges the speed gap between digital preprocessing and analog solving, achieving nearly 2$\times$ speedup and an energy efficiency improvement of over two orders of magnitude compared to optimized software baselines running on modern CPUs.