🤖 AI Summary
Existing LLM optimizers predominantly rely on manual design, lacking task adaptability and automated optimization capabilities. Method: We propose metaTextGrad—the first meta-level automated framework for optimizing LLM optimizers themselves. It jointly optimizes prompt content and reasoning structure via two synergistic components: a meta-prompt optimizer and a meta-structure optimizer. Our approach integrates large language model–driven meta-optimization, controllable structured reasoning, and advanced prompt engineering. Contribution/Results: Evaluated systematically across diverse multi-task benchmarks, metaTextGrad achieves an average absolute performance gain of 6.0%, significantly outperforming state-of-the-art baselines including DSPy and TextGrad. It is the first method to enable end-to-end automated customization of both the optimizer architecture and its prompts, establishing a new paradigm for self-improving LLM optimization systems.
📝 Abstract
Large language models (LLMs) are increasingly used in learning algorithms, evaluations, and optimization tasks. Recent studies have shown that using LLM-based optimizers to automatically optimize model prompts, demonstrations, predictions themselves, or other components can significantly enhance the performance of AI systems, as demonstrated by frameworks such as DSPy and TextGrad. However, optimizers built on language models themselves are usually designed by humans with manual design choices; optimizers themselves are not optimized. Moreover, these optimizers are general purpose by design, to be useful to a broad audience, and are not tailored for specific tasks. To address these challenges, we propose metaTextGrad, which focuses on designing a meta-optimizer to further enhance existing optimizers and align them to be good optimizers for a given task. Our approach consists of two key components: a meta prompt optimizer and a meta structure optimizer. The combination of these two significantly improves performance across multiple benchmarks, achieving an average absolute performance improvement of up to 6% compared to the best baseline.