BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving

📅 2024-11-26
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-source operations research (OR) datasets lack fine-grained annotations of the modeling process—such as variable and constraint definitions—hindering reinforcement learning (RL) applications in mathematical modeling. To address this, we introduce StructuredOR, the first structured, process-level annotated dataset covering the full OR modeling lifecycle. We further propose BPP-Search, an RL-based algorithm that integrates beam search, process-level reward modeling, and pairwise preference optimization within a tree-of-thought reasoning framework. Evaluated on StructuredOR, NL4OPT, and MAMO-ComplexLP, our approach significantly outperforms state-of-the-art methods: it improves modeling reasoning accuracy while simultaneously enhancing solution efficiency, enabling faster and more robust generation of optimal mathematical models.

Technology Category

Application Category

📝 Abstract
LLMs exhibit advanced reasoning capabilities, offering the potential to transform natural language questions into mathematical models. However, existing open-source datasets in operations research domain lack detailed annotations of the modeling process, such as variable definitions, focusing solely on objective values, which hinders reinforcement learning applications. To address this, we release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process. We further propose BPP-Search, an algorithm that integrates reinforcement learning into a tree-of-thought structure using Beam search, a Process reward model, and a pairwise Preference algorithm. This approach enables efficient exploration of tree structures, avoiding exhaustive search while improving accuracy. Extensive experiments on StructuredOR, NL4OPT, and MAMO-ComplexLP datasets show that BPP-Search significantly outperforms state-of-the-art methods. In tree-based reasoning, BPP-Search excels in accuracy and efficiency, enabling faster retrieval of correct solutions. The StructuredOR dataset is available at https://github.com/tengwang0318/StructuredOR.
Problem

Research questions and friction points this paper is trying to address.

Enhancing mathematical modeling process annotation in datasets
Improving tree-of-thought reasoning with reinforcement learning
Increasing accuracy and efficiency in solution retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reinforcement learning with tree-of-thought
Uses Beam search and Process reward model
Improves accuracy and efficiency in reasoning
🔎 Similar Papers
No similar papers found.
T
Teng Wang
Department of Mathematics, the University of Hong Kong, Hong Kong SAR, China
W
Wing-Yin Yu
Noah’s Ark Lab, Huawei, Hong Kong SAR, China
Zhenqi He
Zhenqi He
The Hong Kong University of Science and Technology (HKUST) | The University of Hong Kong (HKU)
Open-World LearningComputer VisionMulti-Modal Learning
Z
Zehua Liu
Department of Mathematics, the University of Hong Kong, Hong Kong SAR, China
Xiongwei Han
Xiongwei Han
AI&OR Principal Researcher at Noah's Ark Lab, Huawei
Intelligence ModelingLLMs for OR
Hailei Gong
Hailei Gong
Bytedance
llm agentoptimization thoery
H
Han Wu
Noah’s Ark Lab, Huawei, Hong Kong SAR, China
W
Wei Shi
Noah’s Ark Lab, Huawei, Hong Kong SAR, China
R
Ruifeng She
Noah’s Ark Lab, Huawei, Hong Kong SAR, China
Fangzhou Zhu
Fangzhou Zhu
Noah's Ark Lab, Huawei Technologies
OptimizationLinear ProgrammingMixed Integer Programming
T
Tao Zhong
Noah’s Ark Lab, Huawei, Shenzhen, China