Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

πŸ“… 2026-01-15
πŸ“ˆ Citations: 4
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of maintaining strategic coherence and iterative refinement in artificial intelligence systems over ultra-long scientific research cycles. To this end, it introduces the ML-Master 2.0 agent, which reconfigures context management as a cognitive accumulation process through a Hierarchical Cognitive Cache (HCC) architecture. Inspired by multi-level memory systems, HCC dynamically distills execution trajectories into stable knowledge, decoupling immediate actions from long-term strategy and thereby transcending the limitations of static context windows. Integrated with dynamic knowledge distillation, cross-task experience consolidation, and large language model–driven autonomous experiment planning, the proposed approach achieves a state-of-the-art medal rate of 56.44% on MLE-Bench under a 24-hour budget, demonstrating for the first time the feasibility of fully autonomous, ultra-long-horizon scientific discovery.

Technology Category

Application Category

πŸ“ Abstract
The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.
Problem

Research questions and friction points this paper is trying to address.

ultra-long-horizon autonomy
agentic science
machine learning engineering
cognitive accumulation
delayed-feedback environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Cognitive Caching
Ultra-Long-Horizon Autonomy
Cognitive Accumulation
Agentic Science
Context Management
πŸ”Ž Similar Papers
No similar papers found.