AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

This work addresses the limitations of 4B-parameter models deployed on edge devices, which suffer from catastrophic forgetting, sensitivity to reward noise, and degraded long-context reasoning in complex tasks. The authors propose AgentCPM-Explore, a co-training framework that integrates parameter-space model fusion, reward signal denoising, and context-aware information refinement to enhance knowledge density and exploration capability in small models. This approach represents the first systematic breakthrough in overcoming performance bottlenecks of edge-scale models, demonstrating that their limitations stem from insufficient inference stability rather than inherent capacity constraints. Experimental results show that the 4B model matches or surpasses 8B counterparts on four benchmarks, outperforms Claude-4.5-Sonnet on five tasks, and achieves a 97.09% pass@64 accuracy on GAIA text tasks.

Technology Category

Application Category

📝 Abstract

While Large Language Model (LLM)-based agents have shown remarkable potential for solving complex tasks, existing systems remain heavily reliant on large-scale models, leaving the capabilities of edge-scale models largely underexplored. In this paper, we present the first systematic study on training agentic models at the 4B-parameter scale. We identify three primary bottlenecks hindering the performance of edge-scale models: catastrophic forgetting during Supervised Fine-Tuning (SFT), sensitivity to reward signal noise during Reinforcement Learning (RL), and reasoning degradation caused by redundant information in long-context scenarios. To address the issues, we propose AgentCPM-Explore, a compact 4B agent model with high knowledge density and strong exploration capability. We introduce a holistic training framework featuring parameter-space model fusion, reward signal denoising, and contextual information refinement. Through deep exploration, AgentCPM-Explore achieves state-of-the-art (SOTA) performance among 4B-class models, matches or surpasses 8B-class SOTA models on four benchmarks, and even outperforms larger-scale models such as Claude-4.5-Sonnet or DeepSeek-v3.2 in five benchmarks. Notably, AgentCPM-Explore achieves 97.09% accuracy on GAIA text-based tasks under pass@64. These results provide compelling evidence that the bottleneck for edge-scale models is not their inherent capability ceiling, but rather their inference stability. Based on our well-established training framework, AgentCPM-Explore effectively unlocks the significant, yet previously underestimated, potential of edge-scale models.

Problem

Research questions and friction points this paper is trying to address.

edge-scale agents

catastrophic forgetting

reward signal noise

reasoning degradation

long-horizon exploration

Innovation

Methods, ideas, or system contributions that make the work stand out.

edge-scale agents

parameter-space model fusion

reward signal denoising