🤖 AI Summary
Building intelligent agents for real-world scenarios is hindered by scarce labeled data and dynamically complex tasks.
Method: We propose a supervision-free, self-optimizing general-purpose agent centered on a dynamic hierarchical workflow architecture. This integrates code-based workflow modeling, task-flow graph representation, self-reflection mechanisms, and a multi-grid heuristic graph optimization–driven evolutionary algorithm to autonomously evolve workflow structures without supervision.
Contribution/Results: Our approach eliminates reliance on annotated data, enabling task-adaptive behavior and continuous hierarchical process optimization. Evaluated across six benchmarks—including programming, mathematical reasoning, and multi-turn question answering—it achieves an average 8.1% improvement over state-of-the-art methods, significantly enhancing both solution efficiency and generalization capability on complex reasoning tasks.
📝 Abstract
Large language models (LLMs) excel at solving complex tasks by executing agentic workflows composed of detailed instructions and structured operations. Yet, building general-purpose agents by manually embedding foundation models into agentic systems such as Chain-of-Thought, Self-Reflection, and ReACT through text interfaces limits scalability and efficiency. Recently, many researchers have sought to automate the generation and optimization of these workflows through code-based representations. However, existing methods often rely on labeled datasets to train and optimize workflows, making them ineffective and inflexible for solving real-world, dynamic problems where labeled data is unavailable. To address this challenge, we introduce Polymath, a self-optimizing agent with dynamic hierarchical workflow that leverages the flexibility of task flow graphs and the expressiveness of code-represented workflows to solve a wide range of real-world, dynamic problems. The proposed optimization methodology integrates multi-grid-inspired graph optimization with a self-reflection-guided evolutionary algorithm to refine workflows without labeled data. Experimental results on six benchmark datasets across coding, math, and multi-turn QA tasks show that Polymath achieves 8.1% average improvement over state-of-the-art baselines.