π€ AI Summary
This work addresses the limitations of existing prompt-based agents in autonomous machine learning engineering, which suffer from behavioral stagnation due to frozen parameters, and conventional reinforcement learning approaches, which are hampered by high execution latency and inefficient data selection. To overcome these challenges, the authors propose an adaptive curriculum sampling mechanism driven by an evolutionary data buffer and a learnable potential function. This approach continuously reuses execution trajectories and dynamically focuses on the agentβs current learning frontier, substantially enhancing long-term iterative optimization efficiency. Evaluated within the GRPO framework, the Ace-30B model achieves a 100% effective submission rate on MLE-Bench-Lite, matching the performance of state-of-the-art closed-source models and outperforming larger open-source baselines such as DeepSeek-V3.2, thereby demonstrating significant advances in data utilization and learning efficiency.
π Abstract
Autonomous Machine Learning Engineering (MLE) requires agents to perform sustained, iterative optimization over long horizons. While recent LLM-based agents show promise, current prompt-based agents for MLE suffer from behavioral stagnation due to frozen parameters. Although Reinforcement Learning (RL) offers a remedy, applying it to MLE is hindered by prohibitive execution latency and inefficient data selection. Recognizing these challenges, we propose AceGRPO with two core components: (1) Evolving Data Buffer that continuously repurposes execution traces into reusable training tasks, and (2) Adaptive Sampling guided by a Learnability Potential function, which dynamically prioritizes tasks at the agent's learning frontier to maximize learning efficiency. Leveraging AceGRPO, our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of proprietary frontier models, and outperforms larger open-source baselines (e.g., DeepSeek-V3.2), demonstrating robust capability for sustained iterative optimization. Code is available at https://github.com/yuzhu-cai/AceGRPO.