🤖 AI Summary
To address the bottleneck in natural language optimization—where problem modeling and solver selection heavily rely on expert knowledge—this paper proposes the first end-to-end automated solving framework. Methodologically, it introduces a four-role LLM agent architecture (Formulator, Planner, Coder, Code Critic) that jointly performs mathematical formalization, hierarchical strategy planning, executable code generation, and self-reflective debugging; it further incorporates a UCB-driven dynamic plan-switching mechanism to enhance robustness. Evaluated on the NLP4LP and the nonlinear subset of Optibench, the framework achieves 88.1% and 71.2% accuracy, respectively, reducing error rates by 58% and 50% over SOTA, while boosting productivity up to 3.3×. Its core contributions are the first multi-agent collaborative modeling paradigm for optimization and a learnable debugging-scheduling mechanism, substantially lowering the expertise barrier for solving optimization problems.
📝 Abstract
Optimization plays a vital role in scientific research and practical applications, but formulating a concrete optimization problem described in natural language into a mathematical form and selecting a suitable solver to solve the problem requires substantial domain expertise. We introduce extbf{OptimAI}, a framework for solving underline{Optim}ization problems described in natural language by leveraging LLM-powered underline{AI} agents, achieving superior performance over current state-of-the-art methods. Our framework is built upon four key roles: (1) a emph{formulator} that translates natural language problem descriptions into precise mathematical formulations; (2) a emph{planner} that constructs a high-level solution strategy prior to execution; and (3) a emph{coder} and a emph{code critic} capable of interacting with the environment and reflecting on outcomes to refine future actions. Ablation studies confirm that all roles are essential; removing the planner or code critic results in $5.8 imes$ and $3.1 imes$ drops in productivity, respectively. Furthermore, we introduce UCB-based debug scheduling to dynamically switch between alternative plans, yielding an additional $3.3 imes$ productivity gain. Our design emphasizes multi-agent collaboration, allowing us to conveniently explore the synergistic effect of combining diverse models within a unified system. Our approach attains 88.1% accuracy on the NLP4LP dataset and 71.2% on the Optibench (non-linear w/o table) subset, reducing error rates by 58% and 50% respectively over prior best results.