🤖 AI Summary
Facing the three critical bottlenecks of poor generalizability, inefficient search, and insufficient modeling capacity in traditional Design Space Exploration (DSE) for CPU architectures—exacerbated by slowing Moore’s Law and tightening power constraints—this paper proposes a unified performance prediction and DSE framework. Our key contributions are: (1) TrACE, the first workload-aware Transformer-based performance predictor; (2) an efficient search mechanism grounded in metric-space mapping; and (3) a Multi-Agent Reinforcement Learning (MARL) framework enabling collaborative exploration across agents. Experiments demonstrate that, compared to state-of-the-art Artificial Neural Network (ANN) baselines, our approach achieves 2.75× higher prediction accuracy under fine-tuning and 6.12× improvement in zero-shot settings; accelerates DSE by 10×; outperforms advanced metaheuristic algorithms by 1.19× in solution quality; and reduces average prediction error by 10.6%.
📝 Abstract
With the diminishing returns of Moore Law scaling and as power constraints become more impactful, processor designs rely on architectural innovation to achieve differentiating performance. Innovation complexity has increased the design space of modern high-performance processors. This work offers an efficient and novel design space exploration (DSE) solution to these challenges of modern CPU design. We identify three key challenges in past DSE approaches: (a) Metric prediction is slow and inaccurate for unseen workloads, microarchitectures, (b) Search is slow and inaccurate in CPU parameter space, and (c) A Single model is unable to learn the huge design space. We present OneDSE, a unified metric predictor and CPU parameter explorer to mitigate these challenges with three key techniques: (a) Transformer-based workload-Aware CPU DSE (TrACE) predictor that outperforms state-of-the-art ANN-based prediction methods by 2.75x and 6.12x with and without fine-tuning, respectively, on several benchmarks; (b) a novel metric space search approach that outperforms optimized metaheuristics by 1.19x while reducing search time by an order of magnitude; (c) MARL-based multi-agent framework that achieves a 10.6% reduction in prediction error compared to its non-MARL counterpart, enabling more accurate and efficient exploration of the CPU design space.