ORGEval: Graph-Theoretic Evaluation of LLMs in Optimization Modeling

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Industrial optimization modeling relies heavily on manual effort and domain expertise, while LLM-based automated modeling lacks robust evaluation methodologies. Existing solver-based evaluations suffer from inconsistency, infeasibility, and high computational overhead. This paper proposes the first graph-theoretic framework for model equivalence assessment: optimization models are formalized as structured graphs, and semantic equivalence is rigorously determined via graph isomorphism detection. We introduce the novel notion of “symmetric decomposability” and prove that, under this condition, the Weisfeiler-Lehman (WL) graph isomorphism test is both sound and numerically robust. We further design an efficient WL variant and a dedicated detection algorithm, achieving perfect consistency and zero false positives. Empirically, our method attains 100% consistency under random parameterization and operates 10–100× faster than conventional solvers. We release Bench4Opt, a benchmark dataset, and demonstrate that DeepSeek-V3 and Claude-Opus-4 achieve top zero-shot performance.

Technology Category

Application Category

📝 Abstract
Formulating optimization problems for industrial applications demands significant manual effort and domain expertise. While Large Language Models (LLMs) show promise in automating this process, evaluating their performance remains difficult due to the absence of robust metrics. Existing solver-based approaches often face inconsistency, infeasibility issues, and high computational costs. To address these issues, we propose ORGEval, a graph-theoretic evaluation framework for assessing LLMs' capabilities in formulating linear and mixed-integer linear programs. ORGEval represents optimization models as graphs, reducing equivalence detection to graph isomorphism testing. We identify and prove a sufficient condition, when the tested graphs are symmetric decomposable (SD), under which the Weisfeiler-Lehman (WL) test is guaranteed to correctly detect isomorphism. Building on this, ORGEval integrates a tailored variant of the WL-test with an SD detection algorithm to evaluate model equivalence. By focusing on structural equivalence rather than instance-level configurations, ORGEval is robust to numerical variations. Experimental results show that our method can successfully detect model equivalence and produce 100% consistent results across random parameter configurations, while significantly outperforming solver-based methods in runtime, especially on difficult problems. Leveraging ORGEval, we construct the Bench4Opt dataset and benchmark state-of-the-art LLMs on optimization modeling. Our results reveal that although optimization modeling remains challenging for all LLMs, DeepSeek-V3 and Claude-Opus-4 achieve the highest accuracies under direct prompting, outperforming even leading reasoning models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' optimization modeling capabilities lacks robust metrics
Existing solver-based methods face inconsistency and high computational costs
Proposing graph-theoretic framework to assess structural equivalence of models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-theoretic framework evaluates LLM optimization modeling
Uses Weisfeiler-Lehman test with symmetry detection
Focuses on structural equivalence over numerical variations
🔎 Similar Papers
No similar papers found.
Zhuohan Wang
Zhuohan Wang
Kings College London
Quantitative FinanceGenerative ModelGame Theory
Z
Ziwei Zhu
The Chinese University of Hong Kong, Shenzhen
Ziniu Li
Ziniu Li
The Chinese University of Hong Kong, Shenzhen
Machine LearningReinforcement LearningLarge Language Models
Congliang Chen
Congliang Chen
Ph.D. Student, the Chinese University of Hong Kong (Shenzhen)
OptimizationMachine Learning
Y
Yizhou Han
The Chinese University of Hong Kong, Shenzhen
Y
Yufeng Lin
The Chinese University of Hong Kong, Shenzhen
Zhihang Lin
Zhihang Lin
Xiamen University & Shanghai Innovation Institute
Efficient Artificial Intelligence
A
Angyang Gu
The Chinese University of Hong Kong, Shenzhen
X
Xinglin Hu
The Chinese University of Hong Kong, Shenzhen
R
Ruoyu Sun
The Chinese University of Hong Kong, Shenzhen; Shenzhen International Center for Industrial and Applied Mathematics
Tian Ding
Tian Ding
Shenzhen Research Institute of Big Data