🤖 AI Summary
This work addresses the limitations of existing large language model (LLM) agents, which often rely on rigid workflows that struggle with open-ended problems, while unconstrained self-evolution risks hallucination and instruction drift. To enable controlled self-evolution, the authors propose EvoFSM, a framework that evolves explicit finite state machines (FSMs). EvoFSM decouples optimization into macro-level state transition logic and micro-level state behaviors, guiding evolution through a critic mechanism within constrained boundaries. Innovatively leveraging structured FSMs as the evolutionary substrate, it integrates a self-evolution memory system that distills successful trajectories into reusable priors and encodes failure patterns as constraints, thereby balancing adaptability and stability. Evaluated on five multi-hop question-answering benchmarks, EvoFSM achieves a DeepSearch accuracy of 58.0% and demonstrates strong generalization in interactive decision-making tasks.
📝 Abstract
While LLM-based agents have shown promise for deep research, most existing approaches rely on fixed workflows that struggle to adapt to real-world, open-ended queries. Recent work therefore explores self-evolution by allowing agents to rewrite their own code or prompts to improve problem-solving ability, but unconstrained optimization often triggers instability, hallucinations, and instruction drift. We propose EvoFSM, a structured self-evolving framework that achieves both adaptability and control by evolving an explicit Finite State Machine (FSM) instead of relying on free-form rewriting. EvoFSM decouples the optimization space into macroscopic Flow (state-transition logic) and microscopic Skill (state-specific behaviors), enabling targeted improvements under clear behavioral boundaries. Guided by a critic mechanism, EvoFSM refines the FSM through a small set of constrained operations, and further incorporates a self-evolving memory that distills successful trajectories as reusable priors and failure patterns as constraints for future queries. Extensive evaluations on five multi-hop QA benchmarks demonstrate the effectiveness of EvoFSM. In particular, EvoFSM reaches 58.0% accuracy on the DeepSearch benchmark. Additional results on interactive decision-making tasks further validate its generalization.