AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This work proposes a post-training framework that integrates synthetic data generation with reinforcement learning to enable large language models to automatically translate natural language descriptions of operations research (OR) problems into formal optimization models—including linear, mixed-integer, and nonlinear formulations—thereby substantially reducing reliance on specialized OR expertise. The approach innovatively leverages solver feedback as a reward signal and introduces a curriculum-based reinforcement learning strategy tailored for nonlinear dynamical problems, achieving, for the first time, effective automated modeling in this challenging domain. Evaluated on an 8B-parameter model, the method matches or exceeds state-of-the-art performance across six standard OR benchmarks, rivaling results from significantly larger models, and boosts solution accuracy on nonlinear dynamics tasks from near 0% to within solvable ranges.

Technology Category

Application Category

📝 Abstract
Optimization problems are central to decision-making in manufacturing, logistics, scheduling, and other industrial settings. Translating complicated descriptions of these problems into solver-ready formulations requires specialized operations research (OR) expertise, making it hard to scale. We present AutoOR, a scalable synthetic data generation and reinforcement learning pipeline that trains LLMs to autoformalize optimization problems specified in natural language across linear, mixed-integer, and non-linear categories. AutoOR generates verified training data from standard optimization forms and uses solver execution feedback as the reward signal for RL post-training. AutoOR applied to an 8B model achieves state-of-the-art or competitive results across six established OR benchmarks, matching significantly larger frontier models. For a non-linear problem class involving physical dynamics, where frontier models score near 0%, we introduce a curriculum RL strategy that bootstraps from limited initial training data to make this class tractable for post-training. We believe that methods such as AutoOR can significantly accelerate industrial decision-making with AI.
Problem

Research questions and friction points this paper is trying to address.

autoformalization
optimization problems
operations research
natural language to formal specification
scalable AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoformalization
reinforcement learning
synthetic data generation
operations research
curriculum learning