OptiVerse: A Comprehensive Benchmark towards Optimization Problem Solving

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study addresses the limited scope of existing large language model (LLM) evaluation benchmarks, which predominantly focus on mathematical programming and combinatorial optimization, thereby failing to comprehensively assess LLM capabilities on complex optimization problems. To bridge this gap, the authors introduce OptiVerse—a comprehensive benchmark encompassing 1,000 difficulty-stratified problems across underexplored domains such as stochastic optimization, dynamic optimization, game-theoretic optimization, and optimal control. The work systematically evaluates 22 prominent LLMs and proposes a novel Dual-View Auditor Agent mechanism that significantly enhances modeling accuracy without substantially increasing computational overhead. Experimental results reveal that even state-of-the-art models achieve less than 27% accuracy on high-difficulty problems, primarily due to modeling and logical errors, while the proposed auditing framework markedly improves LLM performance on complex optimization tasks.

Technology Category

Application Category

📝 Abstract

While Large Language Models (LLMs) demonstrate remarkable reasoning, complex optimization tasks remain challenging, requiring domain knowledge and robust implementation. However, existing benchmarks focus narrowly on Mathematical Programming and Combinatorial Optimization, hindering comprehensive evaluation. To address this, we introduce OptiVerse, a comprehensive benchmark of 1,000 curated problems spanning neglected domains, including Stochastic Optimization, Dynamic Optimization, Game Optimization, and Optimal Control, across three difficulty levels: Easy, Medium, and Hard. The experiments with 22 LLMs of different sizes reveal sharp performance degradation on hard problems, where even advanced models like GPT-5.2 and Gemini-3 struggle to exceed 27% accuracy. Through error analysis, we identify that modeling & logic errors remain the primary bottleneck. Consequently, we propose a Dual-View Auditor Agent that improves the accuracy of the LLM modeling process without introducing significant time overhead. OptiVerse will serve as a foundational platform for advancing LLMs in solving complex optimization challenges.

Problem

Research questions and friction points this paper is trying to address.

Optimization Problem Solving

Large Language Models

Benchmark

Stochastic Optimization

Optimal Control

Innovation

Methods, ideas, or system contributions that make the work stand out.

OptiVerse

optimization benchmark

Large Language Models