Don't Just Fine-tune the Agent, Tune the Environment

📅 2025-10-11

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Large language model (LLM) agents struggle with multi-step tool-use tasks due to scarcity of high-quality expert demonstrations, overfitting in supervised fine-tuning (SFT), and cold-start and training instability issues in reinforcement learning (RL). Method: We propose *Environment Tuning*, a novel paradigm wherein agents autonomously acquire complex behaviors directly from raw problem instances—without expert trajectories—via structured curricula, actionable environment augmentation, fine-grained progress rewards, and synthetic environment interaction. Contribution/Results: This approach circumvents fundamental limitations of SFT and RL, effectively mitigating cold-start and overfitting challenges while substantially improving out-of-distribution generalization. Using only 400 BFCL task instances, our method matches or exceeds state-of-the-art baselines and demonstrates robust performance on unseen tasks.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM) agents show great promise for complex, multi-turn tool-use tasks, but their development is often hampered by the extreme scarcity of high-quality training data. Supervised fine-tuning (SFT) on synthetic data leads to overfitting, whereas standard reinforcement learning (RL) struggles with a critical cold-start problem and training instability. To address these challenges, we introduce $ extbf{Environment Tuning}$, a novel training paradigm that enables agents to learn complex behaviors directly from problem instances without relying on pre-collected expert trajectories. $ extbf{Environment Tuning}$ orchestrates this learning process through a structured curriculum, actionable environment augmentation that provides corrective feedback, and fine-grained progress rewards to ensure stable and efficient exploration. Using only 400 problem instances from Berkeley Function-Calling Leaderboard (BFCL) benchmark, our method not only achieves competitive in-distribution performance against strong baselines but also demonstrates superior out-of-distribution generalization, overcoming the performance collapse common to SFT-based approaches. Our work presents a paradigm shift from supervised fine-tuning on static trajectories to dynamic, environment-based exploration, paving the way for training more robust and data-efficient agents.

Problem

Research questions and friction points this paper is trying to address.

Overcoming data scarcity and overfitting in LLM agent training

Solving cold-start and instability in reinforcement learning for agents

Enhancing generalization beyond training distribution for tool-use tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Environment Tuning enables learning from problem instances directly

Structured curriculum with corrective feedback guides exploration

Fine-grained progress rewards ensure stable training efficiency

🔎 Similar Papers

A Role of Environmental Complexity on Representation Learning in Deep Reinforcement Learning Agents