Scaling Agents via Continual Pre-training

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current general-purpose large language models (LLMs) underperform on agentic tasks—such as tool invocation, multi-step reasoning, and environment interaction—primarily due to the absence of a robust foundational base model endowed with core agentic capabilities. This deficiency forces post-training phases to jointly learn agentic behaviors and expert alignment, resulting in optimization conflicts. To address this, we propose Agentic Continuous Pre-Training (Agentic CPT), the first framework to decouple foundational agentic capability acquisition from expert behavior alignment. Leveraging diverse, multi-source agentic behavioral data, Agentic CPT employs self-supervised continuous pre-training to systematically enhance tool utilization, environment interaction, and long-horizon planning. Built upon this paradigm, the AgentFounder-30B model achieves state-of-the-art performance across ten agentic benchmarks: 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% on HLE Pass@1.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.
Problem

Research questions and friction points this paper is trying to address.

Addressing underperformance of LLMs in agentic tasks
Resolving optimization tensions in post-training alignment
Building robust agentic foundation models via continual pre-training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Continual Pre-training for foundation models
Developed AgentFounder deep research agent model
Achieved state-of-the-art performance on benchmarks
🔎 Similar Papers
No similar papers found.