Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

📅 2024-10-18

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

To address error accumulation arising from distribution mismatch between training (using ground-truth tokens) and inference (using autoregressively generated tokens) in large language models, this paper proposes two training-stage calibration strategies: (1) Batch-Scheduled Sampling—incorporating both ground-truth and model-generated tokens into input batches during offline training; and (2) Reference-Answer-based Correction—enabling sequence-level self-correction via an endogenous, reference-driven supervision mechanism without external models. The approach integrates stochastic sampling scheduling, supervised sequence refinement, and multi-task fine-tuning (covering summarization, general question answering, and mathematical reasoning). Empirical evaluation across diverse generative tasks demonstrates substantial improvements over strong baselines. Results validate that aligning training and inference distributions is critical for enhancing both generation stability and accuracy.

Technology Category

Application Category

📝 Abstract

Language models are often trained to maximize the likelihood of the next token given past tokens in the training dataset. However, during inference time, they are utilized differently, generating text sequentially and auto-regressively by using previously generated tokens as input to predict the next one. Marginal differences in predictions at each step can cascade over successive steps, resulting in different distributions from what the models were trained for and potentially leading to unpredictable behavior. This paper proposes two simple approaches based on model own generation to address this discrepancy between the training and inference time. Our first approach is Batch-Scheduled Sampling, where, during training, we stochastically choose between the ground-truth token from the dataset and the model's own generated token as input to predict the next token. This is done in an offline manner, modifying the context window by interleaving ground-truth tokens with those generated by the model. Our second approach is Reference-Answer-based Correction, where we explicitly incorporate a self-correction capability into the model during training. This enables the model to effectively self-correct the gaps between the generated sequences and the ground truth data without relying on an external oracle model. By incorporating our proposed strategies during training, we have observed an overall improvement in performance compared to baseline methods, as demonstrated by our extensive experiments using summarization, general question-answering, and math question-answering tasks.

Problem

Research questions and friction points this paper is trying to address.

Language Model Consistency

Error Accumulation

Training-Practice Discrepancy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Training Strategies

Self-Correction Mechanism

Enhanced Model Consistency

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation