PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality, competition-level mathematical problems are scarce, hindering the advancement of large language models (LLMs) in advanced mathematical reasoning. Method: We propose the first controllable generation paradigm grounded in joint probability maximization over “concept–reasoning–problem” triples, integrating prompt engineering, chain-of-thought modeling, conditional probability modeling, and multi-stage generation to emulate expert problem authoring logic. Contribution/Results: Our approach enables interpretable, high-fidelity, and high-difficulty Olympiad-style problem generation. It achieves state-of-the-art performance across GSM8K, MATH-500, and AIME2024 benchmarks, outperforming all prior methods. Moreover, it exhibits strong data scalability: performance remains superior as model and dataset sizes increase, demonstrating robust generalization and practical utility for mathematical reasoning research and education.

Technology Category

Application Category

📝 Abstract
The ability of large language models to solve complex mathematical problems has progressed significantly, particularly for tasks requiring advanced reasoning. However, the scarcity of sufficiently challenging problems, particularly at the Olympiad level, hinders further advancements. In this work, we introduce PromptCoT, a novel approach for automatically generating high-quality Olympiad-level math problems. The proposed method synthesizes complex problems based on mathematical concepts and the rationale behind problem construction, emulating the thought processes of experienced problem designers. We provide a theoretical analysis demonstrating that an optimal rationale should maximize both the likelihood of rationale generation given the associated concepts and the likelihood of problem generation conditioned on both the rationale and the concepts. Our method is evaluated on standard benchmarks including GSM8K, MATH-500, and AIME2024, where it consistently outperforms existing problem generation methods. Furthermore, we demonstrate that PromptCoT exhibits superior data scalability, consistently maintaining high performance as the dataset size increases, outperforming the baselines. The implementation is available at https://github.com/zhaoxlpku/PromptCoT.
Problem

Research questions and friction points this paper is trying to address.

Generates Olympiad-level math problems for LLMs
Addresses scarcity of challenging math problems
Improves reasoning in large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically generates Olympiad-level math problems
Emulates experienced problem designers' thought processes
Superior data scalability and benchmark performance
🔎 Similar Papers
No similar papers found.