🤖 AI Summary
Traditional thermal management techniques for many-core HPC processors suffer from constrained control bandwidth, prohibitive computational and memory overhead, and severe OS jitter as the number of processing elements (PEs) increases. To address these challenges, this paper proposes a lightweight hardware–software co-designed Model Predictive Control (MPC) framework. It introduces weak thermal coupling–based state-space pruning to compress the optimization problem and employs a pre-scheduling strategy for parallel solution of sparse triangular systems. Furthermore, an operator-splitting quadratic programming (QP) solver is implemented on a customized multi-core RISC-V embedded controller, enabling deterministic hardware acceleration. Evaluated on a 144-core system at 500 MHz, the approach achieves a control latency of 0.92 ms—33× faster than a single-core baseline—while delivering 7.9× higher energy efficiency, consuming only 325 mW, requiring less than 1 MiB of memory, and occupying under 1.5% of total hardware area.
📝 Abstract
Managing energy and thermal profiles is critical for many-core HPC processors with hundreds of application-class processing elements (PEs). Advanced model predictive control (MPC) delivers state-of-the-art performance but requires solving an online optimization problem over a thousand times per second (1 kHz control bandwidth), with computational and memory demands scaling with PE count. Traditional MPC approaches execute the controller on the PEs, but operating system overheads create jitter and limit control bandwidth. Running MPC on dedicated on-chip controllers enables fast, deterministic control but raises concerns about area and power overhead. In this work, we tackle these challenges by proposing a hardware-software codesign of a lightweight MPC controller, based on an operator-splitting quadratic programming solver and an embedded multi-core RISC-V controller. Key innovations include pruning weak thermal couplings to reduce model memory and ahead-of-time scheduling for efficient parallel execution of sparse triangular systems arising from the optimization problem. The proposed controller achieves sub-millisecond latency when controlling 144 PEs at 500 MHz, delivering 33x lower latency and 7.9x higher energy efficiency than a single-core baseline. Operating within a compact less than 1 MiB memory footprint, it consumes as little as 325 mW while occupying less than 1.5% of a typical HPC processor's die area.