🤖 AI Summary
Existing open-source operations research (OR) datasets lack fine-grained annotations of the modeling process—such as variable and constraint definitions—hindering reinforcement learning (RL) applications in mathematical modeling. To address this, we introduce StructuredOR, the first structured, process-level annotated dataset covering the full OR modeling lifecycle. We further propose BPP-Search, an RL-based algorithm that integrates beam search, process-level reward modeling, and pairwise preference optimization within a tree-of-thought reasoning framework. Evaluated on StructuredOR, NL4OPT, and MAMO-ComplexLP, our approach significantly outperforms state-of-the-art methods: it improves modeling reasoning accuracy while simultaneously enhancing solution efficiency, enabling faster and more robust generation of optimal mathematical models.
📝 Abstract
LLMs exhibit advanced reasoning capabilities, offering the potential to transform natural language questions into mathematical models. However, existing open-source datasets in operations research domain lack detailed annotations of the modeling process, such as variable definitions, focusing solely on objective values, which hinders reinforcement learning applications. To address this, we release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process. We further propose BPP-Search, an algorithm that integrates reinforcement learning into a tree-of-thought structure using Beam search, a Process reward model, and a pairwise Preference algorithm. This approach enables efficient exploration of tree structures, avoiding exhaustive search while improving accuracy. Extensive experiments on StructuredOR, NL4OPT, and MAMO-ComplexLP datasets show that BPP-Search significantly outperforms state-of-the-art methods. In tree-based reasoning, BPP-Search excels in accuracy and efficiency, enabling faster retrieval of correct solutions. The StructuredOR dataset is available at https://github.com/tengwang0318/StructuredOR.