Learning to Insert [PAUSE] Tokens for Better Reasoning

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This work addresses the limited reasoning capability of large language models (LLMs). We propose Dynamic Insertion of [PAUSE] Tokens (DIT), a method that automatically identifies the most uncertain token position—based on the model’s token-level log-likelihood confidence—and inserts a [PAUSE] token there to enhance subsequent reasoning. Unlike heuristic or fixed-position strategies, DIT is the first data-driven, model-self-adaptive, sequence-level intervention technique. It requires no architectural modifications and is compatible with Transformer-based LLMs ranging from 2.7B to 8B parameters. Evaluated on GSM8K, AQUA-RAT, and MBPP benchmarks, DIT improves accuracy by +4.7%, +3.23%, and pass@1 by +3.4%, respectively—outperforming both standard fine-tuning and existing token-insertion methods. The approach demonstrates that targeted, confidence-guided token insertion can effectively augment reasoning without altering model structure or training objectives.

Technology Category

Application Category

📝 Abstract

To enhance reasoning capabilities, previous works have explored incorporating special-purpose tokens into the training process. These strategies strengthen the learning mechanism of transformer-based large language models (LLMs). Building on prior research, in which inserting dummy tokens consecutively just before reasoning steps can enhance effectiveness, we introduce a novel approach termed Dynamic Inserting Tokens Training (DIT). Our method identifies positions within sequences where model confidence is lowest according to token log-likelihood. Strategically inserting [PAUSE] tokens on these positions bolsters the model's predictive capabilities for subsequent tokens. Experimental results across diverse datasets and models, from the 2.7B model to the 8B model, demonstrate that DIT consistently outperforms traditional fine-tuning and previous token insertion methods. With this simple yet effective method, we achieve accuracy gains of up to 4.7%p on GSM8K, 3.23%p on AQUA-RAT, and pass@1 improvements of up to 3.4%p on MBPP datasets. Our work shows a model-based, dynamic approach rather than a heuristic one, thereby broadening the scope of research in reasoning.

Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning in LLMs via dynamic token insertion

Improving model confidence at low-likelihood sequence positions

Boosting accuracy across diverse datasets with [PAUSE] tokens

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Inserting Tokens Training (DIT) method

Insert [PAUSE] tokens at low-confidence positions

Improves reasoning accuracy across multiple datasets

🔎 Similar Papers

No similar papers found.