Co-designing a Programmable RISC-V Accelerator for MPC-based Energy and Thermal Management of Many-Core HPC Processors

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional thermal management techniques for many-core HPC processors suffer from constrained control bandwidth, prohibitive computational and memory overhead, and severe OS jitter as the number of processing elements (PEs) increases. To address these challenges, this paper proposes a lightweight hardware–software co-designed Model Predictive Control (MPC) framework. It introduces weak thermal coupling–based state-space pruning to compress the optimization problem and employs a pre-scheduling strategy for parallel solution of sparse triangular systems. Furthermore, an operator-splitting quadratic programming (QP) solver is implemented on a customized multi-core RISC-V embedded controller, enabling deterministic hardware acceleration. Evaluated on a 144-core system at 500 MHz, the approach achieves a control latency of 0.92 ms—33× faster than a single-core baseline—while delivering 7.9× higher energy efficiency, consuming only 325 mW, requiring less than 1 MiB of memory, and occupying under 1.5% of total hardware area.

Technology Category

Application Category

📝 Abstract
Managing energy and thermal profiles is critical for many-core HPC processors with hundreds of application-class processing elements (PEs). Advanced model predictive control (MPC) delivers state-of-the-art performance but requires solving an online optimization problem over a thousand times per second (1 kHz control bandwidth), with computational and memory demands scaling with PE count. Traditional MPC approaches execute the controller on the PEs, but operating system overheads create jitter and limit control bandwidth. Running MPC on dedicated on-chip controllers enables fast, deterministic control but raises concerns about area and power overhead. In this work, we tackle these challenges by proposing a hardware-software codesign of a lightweight MPC controller, based on an operator-splitting quadratic programming solver and an embedded multi-core RISC-V controller. Key innovations include pruning weak thermal couplings to reduce model memory and ahead-of-time scheduling for efficient parallel execution of sparse triangular systems arising from the optimization problem. The proposed controller achieves sub-millisecond latency when controlling 144 PEs at 500 MHz, delivering 33x lower latency and 7.9x higher energy efficiency than a single-core baseline. Operating within a compact less than 1 MiB memory footprint, it consumes as little as 325 mW while occupying less than 1.5% of a typical HPC processor's die area.
Problem

Research questions and friction points this paper is trying to address.

Designing efficient energy and thermal management for many-core HPC processors
Overcoming computational limitations of model predictive control at high frequencies
Reducing area and power overhead of dedicated on-chip MPC controllers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-software codesign of RISC-V MPC controller
Pruning thermal couplings to reduce memory usage
Ahead-of-time scheduling for parallel sparse execution
🔎 Similar Papers
No similar papers found.