SCOPE: Compress Mathematical Reasoning Steps for Efficient Automated Process Annotation

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prohibitively high annotation cost of Process Reward Models (PRMs) for mathematical reasoning, this paper proposes a compression-driven automated annotation framework. It converts natural-language reasoning steps into AST-normalized code, identifies and merges semantically equivalent steps, and constructs an equivalence-step prefix tree to enable compression-aware high-quality sample generation. By replacing inefficient Monte Carlo sampling, the method reduces annotation complexity from *O(NMK)* to *O(N)*, achieving a 20× speedup—generating a 196K-sample high-quality dataset using only 5% of the computational budget. On both Best-of-N and ProcessBench benchmarks, it outperforms all existing automated annotation methods. Crucially, it is the first approach to jointly improve PRM training data quality and annotation efficiency, establishing a new paradigm for low-cost, high-fidelity alignment of mathematical reasoning.

Technology Category

Application Category

📝 Abstract
Process Reward Models (PRMs) have demonstrated promising results in mathematical reasoning, but existing process annotation approaches, whether through human annotations or Monte Carlo simulations, remain computationally expensive. In this paper, we introduce Step COmpression for Process Estimation (SCOPE), a novel compression-based approach that significantly reduces annotation costs. We first translate natural language reasoning steps into code and normalize them through Abstract Syntax Tree, then merge equivalent steps to construct a prefix tree. Unlike simulation-based methods that waste numerous samples on estimation, SCOPE leverages a compression-based prefix tree where each root-to-leaf path serves as a training sample, reducing the complexity from $O(NMK)$ to $O(N)$. We construct a large-scale dataset containing 196K samples with only 5% of the computational resources required by previous methods. Empirical results demonstrate that PRMs trained on our dataset consistently outperform existing automated annotation approaches on both Best-of-N strategy and ProcessBench.
Problem

Research questions and friction points this paper is trying to address.

Reduce computational cost of mathematical reasoning annotation
Compress reasoning steps via code translation and prefix trees
Improve Process Reward Models' efficiency and performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Translate reasoning steps into code and normalize via Abstract Syntax Tree
Merge equivalent steps to construct a prefix tree
Leverage compression-based prefix tree to reduce complexity
🔎 Similar Papers
No similar papers found.