OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This study addresses the limited scope of existing large language model (LLM) evaluation benchmarks, which predominantly focus on mathematical programming and combinatorial optimization, thereby failing to comprehensively assess LLM capabilities on complex optimization problems. To bridge this gap, the authors introduce OptiVerse—a comprehensive benchmark encompassing 1,000 difficulty-stratified problems across underexplored domains such as stochastic optimization, dynamic optimization, game-theoretic optimization, and optimal control. The work systematically evaluates 22 prominent LLMs and proposes a novel Dual-View Auditor Agent mechanism that significantly enhances modeling accuracy without substantially increasing computational overhead. Experimental results reveal that even state-of-the-art models achieve less than 27% accuracy on high-difficulty problems, primarily due to modeling and logical errors, while the proposed auditing framework markedly improves LLM performance on complex optimization tasks.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) demonstrate remarkable reasoning, complex optimization tasks remain challenging, requiring domain knowledge and robust implementation. However, existing benchmarks focus narrowly on Mathematical Programming and Combinatorial Optimization, hindering comprehensive evaluation. To address this, we introduce OptiVerse, a comprehensive benchmark of 1,000 curated problems spanning neglected domains, including Stochastic Optimization, Dynamic Optimization, Game Optimization, and Optimal Control, across three difficulty levels: Easy, Medium, and Hard. The experiments with 22 LLMs of different sizes reveal sharp performance degradation on hard problems, where even advanced models like GPT-5.2 and Gemini-3 struggle to exceed 27% accuracy. Through error analysis, we identify that modeling & logic errors remain the primary bottleneck. Consequently, we propose a Dual-View Auditor Agent that improves the accuracy of the LLM modeling process without introducing significant time overhead. OptiVerse will serve as a foundational platform for advancing LLMs in solving complex optimization challenges.
Problem

Research questions and friction points this paper is trying to address.

Optimization Problem Solving
Large Language Models
Benchmark
Stochastic Optimization
Optimal Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

OptiVerse
optimization benchmark
Large Language Models
Dual-View Auditor Agent
error analysis
X
Xinyu Zhang
School of Computer Science and Technology, Xi’an Jiaotong University; Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China
B
Boxuan Zhang
School of Computer Science and Technology, Xi’an Jiaotong University; Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China
Y
Yuchen Wan
School of Computer Science and Technology, Xi’an Jiaotong University; Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China
Lingling Zhang
Lingling Zhang
Assistant Professor, Xi'an Jiaotong University
Computer visionFew-shot learningZero-shot learning
Y
YiXing Yao
School of Computer Science and Technology, Xi’an Jiaotong University; Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China
B
Bifan Wei
School of Computer Science and Technology, Xi’an Jiaotong University; Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China
Yaqiang Wu
Yaqiang Wu
Lenovo
J
Jun Liu
School of Computer Science and Technology, Xi’an Jiaotong University; Shaanxi Province Key Laboratory of Big Data Knowledge Engineering, China