A extsuperscript{2}FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) suffer from a binary divide: reasoning-oriented models lack native tool-calling capabilities, while agent-oriented models exhibit insufficient deep reasoning—leading to over-reasoning or redundant tool invocations on simple queries. To bridge this gap, we propose A²FM, an Adaptive Agent Foundation Model that unifies reasoning and tool execution. A²FM introduces a novel “route-then-align” mechanism, enabling dynamic switching among three execution modes—reasoning, tool invocation, and direct pass-through—within a shared backbone architecture. The pass-through mode bypasses unnecessary computation for trivial tasks, while Adaptive Policy Optimization (APO) jointly optimizes accuracy and efficiency via cost-regularized reward shaping and cross-mode sampling. Evaluated at the 32B scale, A²FM achieves 70.4% on AIME25, 13.4% on BrowseComp, and 16.7% on HLE, with a per-correct-answer cost of just $0.00487—reducing costs by 45.2% and 33.5% versus pure reasoning and pure tool-based baselines, respectively.

Technology Category

Application Category

📝 Abstract
Large language models split into two families: reasoning-centric LLMs, which strengthen internal chain-of-thought reasoning but cannot invoke external tools, and agentic LLMs, which learn to interact with environments and leverage tools but often lag in deep reasoning. This divide arises from fundamentally different training objectives, leading to mismatched strengths and inefficiency on simple queries, where both families tend to overthink or over-call tools. In this work, we present Adaptive Agent Foundation Model (A extsuperscript{2}FM), a unified framework that follows a route-then-align principle: the model first learns task-aware routing and then aligns mode-specific trajectories under a shared backbone. To address the inefficiency gap, we introduce a third mode-instant-that handles simple queries directly, preventing unnecessary reasoning or tool calls while complementing the agentic and reasoning modes. To jointly enhance accuracy and efficiency, we propose Adaptive Policy Optimization (APO), which enforces adaptive sampling across modes and applies a cost-regularized reward. On the 32B scale, A extsuperscript{2}FM achieves 13.4% on BrowseComp, 70.4% on AIME25, and 16.7% on HLE, setting new SOTA among comparable models and performing competitively with frontier LLMs across agentic, reasoning, and general benchmarks. Notably, the adaptive execution achieves a cost of pass of only $0.00487 per correct answer-cutting cost by 45.2% relative to reasoning and 33.5% relative to agentic, thus delivering substantially higher cost efficiency while maintaining comparable accuracy.
Problem

Research questions and friction points this paper is trying to address.

Unifying reasoning-centric and agentic LLMs with mismatched training objectives
Addressing inefficiency from overthinking or excessive tool calls on simple queries
Enhancing cost efficiency while maintaining accuracy across diverse benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework with route-then-align principle
Introduces instant mode for simple queries
Uses adaptive policy optimization for efficiency
🔎 Similar Papers
No similar papers found.
Q
Qianben Chen
OPPO AI Agent Team
Jingyi Cao
Jingyi Cao
Beijing University of Posts and Telecommunications
J
Jiayu Zhang
OPPO AI Agent Team
Tianrui Qin
Tianrui Qin
OPPO
Agentic AIDeep LearningLLM Security
X
Xiaowan Li
OPPO AI Agent Team
K
King Zhu
OPPO AI Agent Team
Dingfeng Shi
Dingfeng Shi
OPPO
Video AnalysisAgentic LLM
H
He Zhu
OPPO AI Agent Team
M
Minghao Liu
OPPO AI Agent Team
Xiaobo Liang
Xiaobo Liang
Soochow University
NLP
G
Ge Zhang
OPPO AI Agent Team
J
Jian Yang
OPPO AI Agent Team
Yuchen Eleanor Jiang
Yuchen Eleanor Jiang
OPPO
natural language processingmachine learning
Wangchunshu Zhou
Wangchunshu Zhou
OPPO & M-A-P
artificial general intelligencelanguage agentslarge language modelsnatural language processing