AscendOptimizer: Episodic Agent for Ascend NPU Operator Optimization

📅 2026-03-24
📈 Citations: 0
Influential: 0
📄 PDF
📝 Abstract
AscendC (Ascend C) operator optimization on Huawei Ascend neural processing units (NPUs) faces a two-fold knowledge bottleneck: unlike the CUDA ecosystem, there are few public reference implementations to learn from, and performance hinges on a coupled two-part artifact - a host-side tiling program that orchestrates data movement and a kernel program that schedules and pipelines instructions. We present AscendOptimizer, an episodic agent that bootstraps this missing expertise by turning execution into experience. On the host side, AscendOptimizer performs profiling-in-the-loop evolutionary search to discover valid and high-performing tiling and data-movement configurations directly from hardware feedback. On the kernel side, it mines transferable optimization motifs by rewinding optimized kernels - systematically de-optimizing them to synthesize instructive "bad-to-good" trajectories - and distills these motifs into a retrievable experience bank for guided rewriting. By alternating host tuning and kernel rewriting in a closed loop, AscendOptimizer steadily expands feasibility and pushes latency down. On a benchmark of 127 real AscendC operators, AscendOptimizer achieves a 1.19x geometric-mean speedup over the open-source baseline, with 49.61% of operators outperforming their references, outperforming strong agent and search baselines.
Problem

Research questions and friction points this paper is trying to address.

Ascend NPU
operator optimization
knowledge bottleneck
tiling
kernel scheduling
Innovation

Methods, ideas, or system contributions that make the work stand out.

AscendOptimizer
episodic agent
operator optimization
experience distillation
evolutionary search
J
Jiehao Wu
School of Computer Science and Technology, East China Normal University
Zixiao Huang
Zixiao Huang
East China Normal University PhD Student
Reinforcement learning
Wenhao Li
Wenhao Li
Assistant Professor, Tongji University
Agentic RLGenerative SimulationData-centric Optimization
C
Chuyun Shen
Shanghai University of International Business and Economics
Junjie Sheng
Junjie Sheng
East China Normal University
Learning From FeedbackMulti-AgentScheduling&Planning
X
Xiangfeng Wang
Key Lab of Mathematics and Engineering Applications (MoE), East China Normal University; School of Mathematical Sciences, East China Normal University; Shenzhen Loop Area Institute (SLAI)