🤖 AI Summary
This work addresses the challenge that large language model (LLM) agents struggle to dynamically adapt their strategies in long-horizon, complex tasks due to static configurations. The authors propose ToolSelf, a novel paradigm that models configuration updates as callable tools, unifying task execution with runtime self-reconfiguration. This enables agents to autonomously adjust subgoals, context, strategies, and toolsets. For the first time, self-reconfiguration is internalized as a native tool, effecting a phase transition from externally imposed rules to intrinsically managed parameters, thereby empowering agents to act as dual managers of both tasks and themselves. Through Configuration-Aware Two-stage training (CAT)—combining rejection-sampling fine-tuning and trajectory-level reinforcement learning—the agent internalizes meta-adaptive capabilities. Evaluated across multiple benchmarks, ToolSelf achieves an average performance gain of 24.1%, matching specialized workflows while preserving generality, and demonstrates exceptional task generalization and self-adaptation.
📝 Abstract
Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks. However, their efficacy is fundamentally constrained by static configurations governing agent behaviors, which are fixed prior to execution and fail to adapt to evolving task dynamics. Existing approaches, relying on manual orchestration or heuristic-based patches, often struggle with poor generalization and fragmented optimization. To transcend these limitations, we propose ToolSelf, a novel paradigm enabling tool-driven runtime self-reconfiguration. By abstracting configuration updates as a callable tool, ToolSelf unifies task execution and self-adjustment into a single action space, achieving a phase transition from external rules to intrinsic parameters. Agents can thereby autonomously update their sub-goals and context based on task progression, and correspondingly adapt their strategy and toolbox, transforming from passive executors into dual managers of both task and self. We further devise Configuration-Aware Two-stage Training (CAT), combining rejection sampling fine-tuning with trajectory-level reinforcement learning to internalize this meta-capability. Extensive experiments across diverse benchmarks demonstrate that ToolSelf rivals specialized workflows while generalizing to novel tasks, achieving a 24.1% average performance gain and illuminating a path toward truly self-adaptive agents.