AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering

📅 2026-02-08

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the limitations of existing prompt-based agents in autonomous machine learning engineering, which suffer from behavioral stagnation due to frozen parameters, and conventional reinforcement learning approaches, which are hampered by high execution latency and inefficient data selection. To overcome these challenges, the authors propose an adaptive curriculum sampling mechanism driven by an evolutionary data buffer and a learnable potential function. This approach continuously reuses execution trajectories and dynamically focuses on the agent’s current learning frontier, substantially enhancing long-term iterative optimization efficiency. Evaluated within the GRPO framework, the Ace-30B model achieves a 100% effective submission rate on MLE-Bench-Lite, matching the performance of state-of-the-art closed-source models and outperforming larger open-source baselines such as DeepSeek-V3.2, thereby demonstrating significant advances in data utilization and learning efficiency.

Technology Category

Application Category

📝 Abstract

Autonomous Machine Learning Engineering (MLE) requires agents to perform sustained, iterative optimization over long horizons. While recent LLM-based agents show promise, current prompt-based agents for MLE suffer from behavioral stagnation due to frozen parameters. Although Reinforcement Learning (RL) offers a remedy, applying it to MLE is hindered by prohibitive execution latency and inefficient data selection. Recognizing these challenges, we propose AceGRPO with two core components: (1) Evolving Data Buffer that continuously repurposes execution traces into reusable training tasks, and (2) Adaptive Sampling guided by a Learnability Potential function, which dynamically prioritizes tasks at the agent's learning frontier to maximize learning efficiency. Leveraging AceGRPO, our trained Ace-30B model achieves a 100% valid submission rate on MLE-Bench-Lite, approaches the performance of proprietary frontier models, and outperforms larger open-source baselines (e.g., DeepSeek-V3.2), demonstrating robust capability for sustained iterative optimization. Code is available at https://github.com/yuzhu-cai/AceGRPO.

Problem

Research questions and friction points this paper is trying to address.

Autonomous Machine Learning Engineering

behavioral stagnation

execution latency

data selection efficiency

iterative optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Curriculum

Group Relative Policy Optimization

Evolving Data Buffer