Entropy-Guided Token Dropout: Training Autoregressive Language Models with Limited Domain Data

📅 2025-12-29

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address overfitting and degraded generalization of high-entropy tokens in autoregressive language models under few-shot domain data during multi-round fine-tuning, this paper proposes a structured regularization method dynamically governed by token-level information entropy. The core innovation is a novel entropy-guided, curriculum-style token dropout mechanism—first to explicitly model the dynamic imbalance in token learning difficulty as a principled basis for regularization design, thereby aligning training dynamics with token-level uncertainty. The method integrates entropy estimation, dynamic mask sampling, and curriculum scheduling. Evaluated on models ranging from 0.6B to 8B parameters, it significantly improves training stability and generalization across multiple fine-tuning rounds, consistently outperforming baselines including Dropout and Label Smoothing. On low-resource domain tasks, it achieves an average accuracy gain of 3.2%.

Technology Category

Application Category

📝 Abstract

As access to high-quality, domain-specific data grows increasingly scarce, multi-epoch training has become a practical strategy for adapting large language models (LLMs). However, autoregressive models often suffer from performance degradation under repeated data exposure, where overfitting leads to a marked decline in model capability. Through empirical analysis, we trace this degradation to an imbalance in learning dynamics: predictable, low-entropy tokens are learned quickly and come to dominate optimization, while the model's ability to generalize on high-entropy tokens deteriorates with continued training. To address this, we introduce EntroDrop, an entropy-guided token dropout method that functions as structured data regularization. EntroDrop selectively masks low-entropy tokens during training and employs a curriculum schedule to adjust regularization strength in alignment with training progress. Experiments across model scales from 0.6B to 8B parameters show that EntroDrop consistently outperforms standard regularization baselines and maintains robust performance throughout extended multi-epoch training. These findings underscore the importance of aligning regularization with token-level learning dynamics when training on limited data. Our approach offers a promising pathway toward more effective adaptation of LLMs in data-constrained domains.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in autoregressive models during multi-epoch training

Mitigates overfitting by balancing learning between low- and high-entropy tokens

Enables effective language model adaptation with limited domain-specific data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-guided token dropout method for regularization

Selectively masks low-entropy tokens during training

Curriculum schedule adjusts regularization strength with progress

🔎 Similar Papers

No similar papers found.