BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving

📅 2024-11-26

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing open-source operations research (OR) datasets lack fine-grained annotations of the modeling process—such as variable and constraint definitions—hindering reinforcement learning (RL) applications in mathematical modeling. To address this, we introduce StructuredOR, the first structured, process-level annotated dataset covering the full OR modeling lifecycle. We further propose BPP-Search, an RL-based algorithm that integrates beam search, process-level reward modeling, and pairwise preference optimization within a tree-of-thought reasoning framework. Evaluated on StructuredOR, NL4OPT, and MAMO-ComplexLP, our approach significantly outperforms state-of-the-art methods: it improves modeling reasoning accuracy while simultaneously enhancing solution efficiency, enabling faster and more robust generation of optimal mathematical models.

Technology Category

Application Category

📝 Abstract

LLMs exhibit advanced reasoning capabilities, offering the potential to transform natural language questions into mathematical models. However, existing open-source datasets in operations research domain lack detailed annotations of the modeling process, such as variable definitions, focusing solely on objective values, which hinders reinforcement learning applications. To address this, we release the StructuredOR dataset, annotated with comprehensive labels that capture the complete mathematical modeling process. We further propose BPP-Search, an algorithm that integrates reinforcement learning into a tree-of-thought structure using Beam search, a Process reward model, and a pairwise Preference algorithm. This approach enables efficient exploration of tree structures, avoiding exhaustive search while improving accuracy. Extensive experiments on StructuredOR, NL4OPT, and MAMO-ComplexLP datasets show that BPP-Search significantly outperforms state-of-the-art methods. In tree-based reasoning, BPP-Search excels in accuracy and efficiency, enabling faster retrieval of correct solutions. The StructuredOR dataset is available at https://github.com/tengwang0318/StructuredOR.

Problem

Research questions and friction points this paper is trying to address.

Enhancing mathematical modeling process annotation in datasets

Improving tree-of-thought reasoning with reinforcement learning

Increasing accuracy and efficiency in solution retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reinforcement learning with tree-of-thought

Uses Beam search and Process reward model

Improves accuracy and efficiency in reasoning

🔎 Similar Papers

Boosting of Thoughts: Trial-and-Error Problem Solving with Large Language Models