Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

📅 2024-11-29
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mathematical reasoning poses a significant challenge for large language models (LLMs) due to its stringent logical requirements; existing approaches often suffer from cascading errors stemming from misclassification of critical tokens. This paper introduces, for the first time, the concept of *critical tokens*—decision-relevant tokens that, while syntactically correct, substantially impact final answer correctness—and systematically identifies them via rollout sampling. We further propose a token-level contrastive estimation framework that integrates critical-token intervention into a customized Direct Preference Optimization (cDPO) objective, enabling fine-grained reasoning correction. Evaluated on GSM8K and our newly curated MATH500 benchmark, cDPO boosts Llama-3-8B’s accuracy to 85.6% (+4.2 percentage points), outperforming all baselines. To foster reproducibility and community advancement, we publicly release our code, human-annotated critical-token labels, and fine-tuned models.

Technology Category

Application Category

📝 Abstract
Mathematical reasoning tasks pose significant challenges for large language models (LLMs) because they require precise logical deduction and sequence analysis. In this work, we introduce the concept of critical tokens -- elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling and demonstrate their substantial divergence from traditional error tokens. Through extensive experiments on datasets such as GSM8K and MATH500, we show that identifying and replacing critical tokens significantly improves model accuracy. We propose an efficient methodology for pinpointing these tokens in large-scale datasets using contrastive estimation and extend this framework to enhance model training processes with direct preference optimization (DPO). Experimental results on GSM8K and MATH500 benchmarks with the widely used models Llama-3 (8B and 70B) and Deepseek-math (7B) demonstrate the effectiveness of the proposed approach, cDPO. Our results underscore the potential of leveraging critical tokens to reduce errors in reasoning tasks, advancing the development of AI systems capable of robust logical deduction. Our code, annotated datasets, and trained models are available at https://github.com/chenzhiling9954/Critical-Tokens-Matter to support and encourage future research in this promising field.
Problem

Research questions and friction points this paper is trying to address.

Mathematical Reasoning
Large Language Models
Accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Key Tokens
cDPO (Corrected Direct Preference Optimization)
Mathematical Reasoning
🔎 Similar Papers
No similar papers found.