OptiMind: Teaching LLMs to Think Like Optimization Experts

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the accuracy bottleneck in automatic natural language-to-mixed-integer linear programming (NL-to-MILP) modeling—stemming from scarce high-quality annotated data and insufficient integration of domain expertise—this paper proposes an optimization-knowledge-enhanced large language model (LLM) framework. Our method comprises three core components: (1) a fine-grained, category-specific error analysis–driven data cleaning strategy; (2) a MILP-semantic-structured, class-aware multi-turn reasoning prompting framework; and (3) an iterative validation and refinement mechanism incorporating solver feedback. Extensive experiments across multiple foundational LLMs demonstrate an average 14.2-percentage-point improvement in modeling accuracy. Notably, robustness is significantly enhanced on critical subtasks—including complex constraint formulation and integer variable identification. The proposed approach establishes a new, interpretable, and solver-verified paradigm for AI-driven operations research modeling.

Technology Category

Application Category

📝 Abstract
Mathematical programming -- the task of expressing operations and decision-making problems in precise mathematical language -- is fundamental across domains, yet remains a skill-intensive process requiring operations research expertise. Recent advances in large language models for complex reasoning have spurred interest in automating this task, translating natural language into executable optimization models. Current approaches, however, achieve limited accuracy, hindered by scarce and noisy training data without leveraging domain knowledge. In this work, we systematically integrate optimization expertise to improve formulation accuracy for mixed-integer linear programming, a key family of mathematical programs. Our approach first cleans training data through class-based error analysis to explicitly prevent common mistakes within each optimization class. We then develop multi-turn inference strategies that guide LLMs with class-specific error summaries and solver feedback, enabling iterative refinement. Experiments across multiple base LLMs demonstrate that combining cleaned data with domain-informed prompting and feedback improves formulation accuracy by 14 percentage points on average, enabling further progress toward robust LLM-assisted optimization formulation.
Problem

Research questions and friction points this paper is trying to address.

Automating mathematical programming by translating natural language into optimization models
Improving formulation accuracy for mixed-integer linear programming problems
Addressing limited accuracy through domain knowledge integration and data cleaning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cleans training data via class-based error analysis
Guides LLMs with class-specific error summaries
Uses solver feedback for iterative refinement
🔎 Similar Papers
No similar papers found.
Z
Zeyi Chen
University of Washington
X
Xinzhi Zhang
University of Washington
H
Humishka Zope
Stanford University
H
Hugo Barbalho
Microsoft Research
K
Konstantina Mellou
Microsoft Research
Marco Molinaro
Marco Molinaro
Microsoft Research and PUC-Rio
Janardhan Kulkarni
Janardhan Kulkarni
Microsoft Research, Redmond
Algorithm DesignOptimization Under UncertaintyAlgorithmic Game TheoryDifferential PrivacyMachine Learning
Ishai Menache
Ishai Menache
Microsoft Research
optimizationmachine learningcloud computingsupply chain
S
Sirui Li
Microsoft Research