Data Diversification Methods In Alignment Enhance Math Performance In LLMs

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited mathematical reasoning capabilities due to homogeneity in preference data, which constrains the diversity of reasoning trajectories during alignment. Method: We propose Diversified-ThinkSolve (DTS), a framework that structurally decomposes mathematical problems and generates semantically and logically complementary reasoning paths. DTS achieves efficient path generation with minimal computational overhead (1.03× baseline cost) by integrating temperature sampling, chain-of-thought prompting, and a lightweight Monte Carlo Tree Search (MCTS) to construct high-quality preference pairs. These diverse trajectories are then leveraged in preference learning to enhance mathematical alignment. Contribution/Results: On GSM8K and MATH benchmarks, DTS improves accuracy by 7.1% and 4.2%, respectively—outperforming costly full-scale MCTS approaches. It establishes a new paradigm for cost-effective, high-yield mathematical reasoning alignment through controlled trajectory diversification.

Technology Category

Application Category

📝 Abstract

While recent advances in preference learning have enhanced alignment in human feedback, mathematical reasoning remains a persistent challenge. We investigate how data diversification strategies in preference optimization can improve the mathematical reasoning abilities of large language models (LLMs). We evaluate three common data generation methods: temperature sampling, Chain-of-Thought prompting, and Monte Carlo Tree Search (MCTS), and introduce Diversified-ThinkSolve (DTS), a novel structured approach that systematically decomposes problems into diverse reasoning paths. Our results show that with strategically diversified preference data, models can substantially improve mathematical reasoning performance, with the best approach yielding gains of 7.1% on GSM8K and 4.2% on MATH over the base model. Despite its strong performance, DTS incurs only a marginal computational overhead (1.03x) compared to the baseline, while MCTS is nearly five times more costly with lower returns. These findings demonstrate that structured exploration of diverse problem-solving methods creates more effective preference data for mathematical alignment than traditional approaches.

Problem

Research questions and friction points this paper is trying to address.

Enhancing math reasoning in LLMs via data diversification

Improving preference data quality for mathematical alignment

Reducing computational cost while boosting math performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversified-ThinkSolve decomposes problems systematically

Temperature sampling and Chain-of-Thought prompting evaluated

Monte Carlo Tree Search incurs high computational cost

🔎 Similar Papers

No similar papers found.