🤖 AI Summary
To address two key bottlenecks in large language model (LLM) prompt optimization—insufficient diversity and cross-task semantic drift—this paper proposes an automatic prompt refinement framework based on residual-optimized trees. Methodologically, it introduces residual connections for the first time to explicitly suppress semantic drift; further, it designs a text-gradient-driven multi-branch tree search that simultaneously ensures convergence stability and enhances exploratory diversity. Core technical components include text-gradient computation, perplexity-based evaluation, tree-structured candidate generation, and gradient-aware pruning. Evaluated across five challenging benchmarks—commonsense, mathematical, logical, temporal, and semantic reasoning—the framework consistently outperforms both handcrafted prompts and state-of-the-art automated prompt optimizers. It achieves substantial improvements in generalization capability and task robustness, demonstrating superior cross-task adaptability and reliability.
📝 Abstract
Recent advancements in large language models (LLMs) have highlighted their potential across a variety of tasks, but their performance still heavily relies on the design of effective prompts. Existing methods for automatic prompt optimization face two challenges: lack of diversity, limiting the exploration of valuable and innovative directions and semantic drift, where optimizations for one task can degrade performance in others. To address these issues, we propose Residual Optimization Tree (RiOT), a novel framework for automatic prompt optimization. RiOT iteratively refines prompts through text gradients, generating multiple semantically diverse candidates at each step, and selects the best prompt using perplexity. Additionally, RiOT incorporates the text residual connection to mitigate semantic drift by selectively retaining beneficial content across optimization iterations. A tree structure efficiently manages the optimization process, ensuring scalability and flexibility. Extensive experiments across five benchmarks, covering commonsense, mathematical, logical, temporal, and semantic reasoning, demonstrate that RiOT outperforms both previous prompt optimization methods and manual prompting.