AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing adaptive LLM agent approaches face a fundamental trade-off: hand-crafted reflective mechanisms lack flexibility, while gradient-based fine-tuning paradigms incur prohibitive computational overhead. This paper proposes a fine-tuning-free, low-cost continual adaptation framework grounded in a memory-augmented Markov Decision Process (MDP). The framework integrates neural case selection, differentiable and non-parametric episodic memory storage, memory rewriting, and online reinforcement learning—enabling gradient-free dynamic evolution of agent behavior. Crucially, it avoids parameter updates entirely while supporting real-time adaptation to novel tasks and environments. Empirically, the method achieves state-of-the-art out-of-distribution generalization: 87.88% Pass@3 on the GAIA validation set (79.40% on the test set) and attains 66.6% F1-score and 80.4% Process Match (PM) on the DeepResearcher benchmark—consistently surpassing existing training-intensive approaches.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely AgentFly, which attains top-1 on GAIA validation ($87.88%$ Pass@$3$) and $79.40%$ on the test set. It reaches $66.6%$ F1 and $80.4%$ PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds $4.7%$ to $9.6%$ absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/AgentFly.
Problem

Research questions and friction points this paper is trying to address.

Adapting LLM agents without fine-tuning LLMs
Enabling low-cost continual learning via memory
Improving agent performance in deep research scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Memory-based online reinforcement learning for adaptation
Neural case-selection policy guiding action decisions
Memory rewriting mechanism enabling continuous policy improvement
🔎 Similar Papers
No similar papers found.