LLMs for Mathematical Modeling: Towards Bridging the Gap between Natural and Mathematical Languages

📅 2024-05-21
📈 Citations: 3
Influential: 2
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit significant limitations in translating natural language to formal mathematical language—particularly in mathematical modeling tasks. Method: We propose the first process-oriented evaluation paradigm, featuring a solver-driven automatic verification framework that integrates symbolic solvers (SciPy, Gurobi) and formal verification techniques to assess LLMs’ accuracy in constructing ordinary differential equation (ODE), linear programming (LP), and mixed-integer programming (MIP) models. Contribution/Results: We release Mamo, an open-source benchmark comprising 1,209 diverse modeling problems, enabling reproducible and fine-grained assessment of modeling capabilities. Empirical evaluation reveals that state-of-the-art LLMs remain substantially weak on complex modeling tasks; performance scales positively with model size, yet open-weight models approach closed-weight counterparts only on simple instances, with substantial gaps persisting on high-difficulty problems.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated strong performance across various natural language processing tasks, yet their proficiency in mathematical reasoning remains a key challenge. Addressing the gap between natural and mathematical language requires advanced reasoning capabilities, approaching those of Artificial General Intelligence (AGI). However, the evaluation remains challenging, as perfectly representing reality is inherently elusive, and traditional methods like manual or direct comparison of mathematical statements (Ramamonjison et al., 2023) are insufficient for assessing true modeling ability. We propose a process-oriented framework to evaluate LLMs' ability to construct mathematical models, using solvers to compare outputs with ground truth. Introducing Mamo, a benchmark with 1,209 questions covering ordinary differential equations, linear programming, and mixed-integer linear programming, we enable automatic evaluation of modeling accuracy. The results show that existing LLMs struggle with complex mathematical modeling tasks, with larger models demonstrating superior performance, while open-source models remain competitive in simpler cases but still fall short of proprietary models in more challenging problems.
Problem

Research questions and friction points this paper is trying to address.

Bridge natural and mathematical language gaps
Evaluate LLMs' mathematical modeling abilities
Assess complex mathematical reasoning proficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for mathematical modeling
Introducing Mamo benchmark
Automatic evaluation using solvers
🔎 Similar Papers
No similar papers found.
Xuhan Huang
Xuhan Huang
The Chinese University of Hong Kong, Shenzhen
Machine learning
Q
Qingning Shen
The Chinese University of Hong Kong, Shenzhen
Y
Yan Hu
The Chinese University of Hong Kong, Shenzhen; Shenzhen Research Institute of Big Data
A
Anningzhe Gao
Shenzhen Research Institute of Big Data
Benyou Wang
Benyou Wang
Assistant Professor, The Chinese University of Hong Kong, Shenzhen
large language modelsnatural language processinginformation retrievalapplied machine learning