ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing mathematical reasoning benchmarks fail to capture the complexity of real-world, open-ended, and interdisciplinary modeling problems. To address this, we propose ModelingBench—the first interdisciplinary mathematical modeling benchmark supporting solution multiplicity and expert-in-the-loop evaluation. We further introduce ModelingAgent, a modeling framework grounded in multi-agent collaboration, tool-augmented reasoning, iterative self-reflection, and structured workflows. Additionally, we develop ModelingJudge, an expert-in-the-loop LLM-based adjudication system enabling domain-specific assessment. Experiments demonstrate that ModelingAgent generates solutions indistinguishable from those produced by human experts across multiple authentic modeling tasks, significantly outperforming strong baselines. These results validate its effectiveness and robustness on open-ended, interdisciplinary problems.

Technology Category

Application Category

📝 Abstract
Recent progress in large language models (LLMs) has enabled substantial advances in solving mathematical problems. However, existing benchmarks often fail to reflect the complexity of real-world problems, which demand open-ended, interdisciplinary reasoning and integration of computational tools. To address this gap, we introduce ModelingBench, a novel benchmark featuring real-world-inspired, open-ended problems from math modeling competitions across diverse domains, ranging from urban traffic optimization to ecosystem resource planning. These tasks require translating natural language into formal mathematical formulations, applying appropriate tools, and producing structured, defensible reports. ModelingBench also supports multiple valid solutions, capturing the ambiguity and creativity of practical modeling. We also present ModelingAgent, a multi-agent framework that coordinates tool use, supports structured workflows, and enables iterative self-refinement to generate well-grounded, creative solutions. To evaluate outputs, we further propose ModelingJudge, an expert-in-the-loop system leveraging LLMs as domain-specialized judges assessing solutions from multiple expert perspectives. Empirical results show that ModelingAgent substantially outperforms strong baselines and often produces solutions indistinguishable from those of human experts. Together, our work provides a comprehensive framework for evaluating and advancing real-world problem-solving in open-ended, interdisciplinary modeling challenges.
Problem

Research questions and friction points this paper is trying to address.

Bridging LLMs and mathematical modeling for real-world challenges
Addressing complexity gaps in existing benchmarks with open-ended problems
Developing multi-agent framework for interdisciplinary, creative solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ModelingBench for real-world problem benchmarking
Develops ModelingAgent for multi-agent solution generation
Proposes ModelingJudge for expert-in-the-loop evaluation
🔎 Similar Papers
No similar papers found.