MagicAgent: Towards Generalized Agent Planning

📅 2026-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited generalization of current large language models in diverse planning tasks, primarily caused by the scarcity of high-quality interactive data and gradient conflicts during multi-task training. To overcome these challenges, we propose MagicAgent, a foundational planning model that introduces a lightweight and scalable synthetic trajectory generation framework. This framework integrates hierarchical task decomposition, tool augmentation, and multi-constraint scheduling to produce synthetic data spanning a broad spectrum of planning scenarios. A two-stage training paradigm—supervised fine-tuning followed by multi-objective reinforcement learning—effectively mitigates inter-task interference and substantially enhances cross-task generalization. Experimental results demonstrate that MagicAgent-32B and MagicAgent-30B-A3B significantly outperform existing open- and closed-source models on benchmarks such as Worfbench and NaturalPlan, achieving a peak accuracy of 86.9%.

Technology Category

Application Category

📝 Abstract
The evolution of Large Language Models (LLMs) from passive text processors to autonomous agents has established planning as a core component of modern intelligence. However, achieving generalized planning remains elusive, not only by the scarcity of high-quality interaction data but also by inherent conflicts across heterogeneous planning tasks. These challenges result in models that excel at isolated tasks yet struggle to generalize, while existing multi-task training attempts suffer from gradient interference. In this paper, we present \textbf{MagicAgent}, a series of foundation models specifically designed for generalized agent planning. We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks, including hierarchical task decomposition, tool-augmented planning, multi-constraint scheduling, procedural logic orchestration, and long-horizon tool execution. To mitigate training conflicts, we propose a two-stage training paradigm comprising supervised fine-tuning followed by multi-objective reinforcement learning over both static datasets and dynamic environments. Empirical results demonstrate that MagicAgent-32B and MagicAgent-30B-A3B deliver superior performance, achieving accuracies of $75.1\%$ on Worfbench, $55.9\%$ on NaturalPlan, $57.5\%$ on $τ^2$-Bench, $86.9\%$ on BFCL-v3, and $81.2\%$ on ACEBench, as well as strong results on our in-house MagicEval benchmarks. These results substantially outperform existing sub-100B models and even surpass leading closed-source models.
Problem

Research questions and friction points this paper is trying to address.

generalized planning
large language models
heterogeneous tasks
data scarcity
gradient interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

generalized agent planning
synthetic data generation
multi-objective reinforcement learning
two-stage training paradigm
foundation models for planning
🔎 Similar Papers
No similar papers found.
X
Xuhui Ren
Honor Device Co., Ltd
Shaokang Dong
Shaokang Dong
Honor Device Co., Ltd
Multi-agent RLRLHFLLM Agent
C
Chen Yang
Honor Device Co., Ltd
Q
Qing Gao
Honor Device Co., Ltd
Y
Yunbin Zhao
Honor Device Co., Ltd
Y
Yongsheng Liu
Honor Device Co., Ltd
Xinwei Geng
Xinwei Geng
Harbin Institute of Technology
Neural Machine TranslationNatural Language Processing
Xiang Li
Xiang Li
OPPO
Robot VisionMachine LearningObject Recognition
D
Demei Yan
Honor Device Co., Ltd
Y
Yanqing Li
Honor Device Co., Ltd
Chenhao Huang
Chenhao Huang
School of Computer Science, University of Sydney
Distributed data managementDistributed systems
D
Dingwei Zhu
Fudan University
Junjie Ye
Junjie Ye
Fudan University
Computer ScienceNatural Language ProcessingLarge Language ModelsTool Learning
B
Boxuan Yue
Honor Device Co., Ltd
Y
Yingnan Fu
Honor Device Co., Ltd
M
Mengzhe Lv
Honor Device Co., Ltd
Z
Zezeng Feng
Honor Device Co., Ltd
B
Boshen Zhou
Honor Device Co., Ltd
B
Bocheng Wang
Honor Device Co., Ltd
X
Xuanjing Huang
Fudan University
Yu-Gang Jiang
Yu-Gang Jiang
Professor, Fudan University. IEEE & IAPR Fellow
Video AnalysisEmbodied AITrustworthy AI
T
Tao Gui
Fudan University
Qi Zhang
Qi Zhang
Fudan University
SAGINsatellite routing
Y
Yunke Zhang
Honor Device Co., Ltd