TodoEvolve: Learning to Architect Agent Planning Systems

πŸ“… 2026-02-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing agent planning systems predominantly rely on handcrafted, fixed architectures, limiting their adaptability to the diverse demands of open-domain tasks. This work proposes TodoEvolveβ€”a novel meta-planning paradigm that achieves, for the first time, end-to-end autonomous evolution of planning architectures. The approach introduces PlanFactory, a unified modular design space, and leverages Impedance-Guided Preference Optimization (IGPO) to train the Todo-14B large model, augmented with multi-objective reinforcement learning to jointly optimize performance, stability, and computational efficiency. Evaluated across five agent benchmarks, TodoEvolve substantially outperforms manually designed planning modules while maintaining low API invocation costs and runtime overhead.

Technology Category

Application Category

πŸ“ Abstract
Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.
Problem

Research questions and friction points this paper is trying to address.

agent planning
planning architecture
structural diversity
open-ended problems
adaptive planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-planning
PlanFactory
Impedance-Guided Preference Optimization
dynamic planning architecture
token-efficient planning
πŸ”Ž Similar Papers