Agent models: Internalizing Chain-of-Action Generation into Reasoning models

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional agents rely on external prompting to orchestrate tool usage, limiting the autonomy of reasoning models. This work introduces Large Agent Models (LAMs), which internalize Chain-of-Action (CoA) generation within the reasoning process, enabling end-to-end autonomous decision-making and environment interaction. Our method proposes: (1) the first internalized CoA generation mechanism; (2) the AutoCoA framework, integrating step-level action triggering, trajectory-level CoA optimization, and a lightweight internal world model; and (3) a dynamic reasoning–action switching mechanism trained via joint supervised fine-tuning and reinforcement learning. Evaluated on open-domain question answering, LAMs significantly outperform ReAct—achieving higher task completion rates, especially in long-horizon reasoning and complex multi-step scenarios—while demonstrating superior robustness and generalization.

Technology Category

Application Category

📝 Abstract
Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position emph{Large Agent Models (LAMs)} that internalize the generation of emph{Chain-of-Action (CoA)}, enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA
Problem

Research questions and friction points this paper is trying to address.

Enhance autonomy in reasoning models by internalizing action generation.
Develop AutoCoA framework combining SFT and RL for efficient tool usage.
Improve task completion in long-term reasoning and multi-step actions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Internalizes Chain-of-Action generation autonomously
Combines supervised fine-tuning with reinforcement learning
Optimizes action triggering and reduces interaction costs
🔎 Similar Papers
No similar papers found.
Y
Yuxiang Zhang
School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
Yuqi Yang
Yuqi Yang
Nankai University
Computer VisionSemantic Segmentation
J
Jiangming Shu
School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
X
Xinyan Wen
School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
J
Jitao Sang
School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China