🤖 AI Summary
Traditional agents rely on external prompting to orchestrate tool usage, limiting the autonomy of reasoning models. This work introduces Large Agent Models (LAMs), which internalize Chain-of-Action (CoA) generation within the reasoning process, enabling end-to-end autonomous decision-making and environment interaction. Our method proposes: (1) the first internalized CoA generation mechanism; (2) the AutoCoA framework, integrating step-level action triggering, trajectory-level CoA optimization, and a lightweight internal world model; and (3) a dynamic reasoning–action switching mechanism trained via joint supervised fine-tuning and reinforcement learning. Evaluated on open-domain question answering, LAMs significantly outperform ReAct—achieving higher task completion rates, especially in long-horizon reasoning and complex multi-step scenarios—while demonstrating superior robustness and generalization.
📝 Abstract
Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position emph{Large Agent Models (LAMs)} that internalize the generation of emph{Chain-of-Action (CoA)}, enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA