Performance of LLMs on Stochastic Modeling Operations Research Problems: From Theory to Practice

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic evaluation of large language models (LLMs) on stochastic modeling tasks in operations research—spanning theoretical problems (e.g., graduate coursework and Ph.D. qualifying exams) and practical simulation-optimization challenges (drawn from the SimOpt open-source library). Method: We design a multi-tiered benchmark grounded in probability theory, statistics, and stochastic processes to rigorously assess LLMs’ capabilities in uncertainty modeling, analysis, and optimization. Contribution/Results: Experimental results demonstrate that state-of-the-art LLMs achieve near-expert human performance on stochastic modeling tasks, particularly excelling in problem comprehension, model formulation, and solution reasoning. This work establishes the first comprehensive, reproducible evaluation framework for LLMs on operations research problems involving uncertainty, thereby bridging a critical gap in AI assessment and providing empirical foundations for AI-augmented decision modeling.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have exhibited expert-level capabilities across various domains. However, their abilities to solve problems in Operations Research (OR) -- the analysis and optimization of mathematical models derived from real-world problems or their verbal descriptions -- remain underexplored. In this work, we take a first step toward evaluating LLMs' abilities to solve stochastic modeling problems, a core class of OR problems characterized by uncertainty and typically involving tools from probability, statistics, and stochastic processes. We manually procure a representative set of graduate-level homework and doctoral qualification-exam problems and test LLMs' abilities to solve them. We further leverage SimOpt, an open-source library of simulation-optimization problems and solvers, to investigate LLMs' abilities to make real-world decisions under uncertainty. Our results show that, though a nontrivial amount of work is still needed to reliably automate the stochastic modeling pipeline in reality, state-of-the-art LLMs demonstrate proficiency on par with human experts in both classroom and practical settings. These findings highlight the potential of building AI agents that assist OR researchers and amplify the real-world impact of OR through automation.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs' ability to solve stochastic OR problems
Test LLMs on graduate-level and exam OR problems
Assess LLMs' decision-making under uncertainty in OR
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluate LLMs on stochastic OR problems
Use SimOpt for real-world decision testing
LLMs match human expert performance
🔎 Similar Papers
No similar papers found.