CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hardware–mapping co-design for neural network accelerators faces challenges including a high-dimensional structured search space, complex hard constraints, and prohibitively expensive simulation overhead. Method: This paper proposes a constraint-aware one-shot reinforcement learning framework that eliminates conventional value functions and critic networks. Instead, it introduces an intra-batch reward comparison mechanism, a graph-structured decoder, a scaled-graph modeling approach for structured policy distributions, and simulation-feedback-driven reward shaping—enabling end-to-end joint optimization of constraint satisfaction and sampling efficiency. Results: In design space exploration (DSE), our method achieves 2.3× higher sample efficiency than state-of-the-art multi-step RL approaches, generates superior hardware configurations, increases the number of Pareto-optimal solutions by 37%, and consistently outperforms both heuristic and existing RL-based baselines across all performance metrics.

Technology Category

Application Category

📝 Abstract
Simulation-based design space exploration (DSE) aims to efficiently optimize high-dimensional structured designs under complex constraints and expensive evaluation costs. Existing approaches, including heuristic and multi-step reinforcement learning (RL) methods, struggle to balance sampling efficiency and constraint satisfaction due to sparse, delayed feedback, and large hybrid action spaces. In this paper, we introduce CORE, a constraint-aware, one-step RL method for simulationguided DSE. In CORE, the policy agent learns to sample design configurations by defining a structured distribution over them, incorporating dependencies via a scaling-graph-based decoder, and by reward shaping to penalize invalid designs based on the feedback obtained from simulation. CORE updates the policy using a surrogate objective that compares the rewards of designs within a sampled batch, without learning a value function. This critic-free formulation enables efficient learning by encouraging the selection of higher-reward designs. We instantiate CORE for hardware-mapping co-design of neural network accelerators, demonstrating that it significantly improves sample efficiency and achieves better accelerator configurations compared to state-of-the-art baselines. Our approach is general and applicable to a broad class of discrete-continuous constrained design problems.
Problem

Research questions and friction points this paper is trying to address.

Optimize high-dimensional designs under complex constraints efficiently
Balance sampling efficiency and constraint satisfaction in RL methods
Improve hardware-mapping co-design of neural network accelerators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constraint-aware one-step RL method
Scaling-graph-based decoder for dependencies
Critic-free surrogate objective for efficiency
🔎 Similar Papers
No similar papers found.
Y
Yifeng Xiao
University of California, Berkeley
Y
Yurong Xu
N
Ning Yan
Futurewei Technologies
M
Masood S. Mortazavi
Futurewei Technologies
Pierluigi Nuzzo
Pierluigi Nuzzo
EECS Department, University of California, Berkeley
System DesignTrustworthy AIHardware SecurityCyber-Physical SystemsAMS Design