Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 2

career value

212K/year

🤖 AI Summary

This work addresses the issue of catastrophic forgetting in supervised fine-tuning (SFT) for domain adaptation, which stems from destructive gradient updates caused by low-entropy yet low-probability “confidently conflicting” samples. To mitigate this, the authors propose Entropy-Adaptive Fine-Tuning (EAFT), a novel approach that introduces token-level entropy as a gradient gating mechanism. EAFT effectively distinguishes between epistemic uncertainty and knowledge conflict, dynamically suppressing harmful updates from conflicting samples while preserving the model’s ability to learn from uncertain ones. Experiments on large language models—including Qwen and GLM—demonstrate that EAFT significantly alleviates degradation in general capabilities across mathematical, medical, and agent-based tasks, while maintaining downstream performance comparable to standard SFT.

Technology Category

Application Category

📝 Abstract

Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning (RL) effectively preserves general capabilities. We investigate this discrepancy and identify a fundamental distributional gap: while RL aligns with the model's internal belief, SFT forces the model to fit external supervision. This mismatch often manifests as"Confident Conflicts"tokens characterized by low probability but low entropy. In these instances, the model is highly confident in its own prediction but is forced to learn a divergent ground truth, triggering destructive gradient updates. To address this, we propose Entropy-Adaptive Fine-Tuning (EAFT). Unlike methods relying solely on prediction probability, EAFT utilizes token-level entropy as a gating mechanism to distinguish between epistemic uncertainty and knowledge conflict. This allows the model to learn from uncertain samples while suppressing gradients on conflicting data. Extensive experiments on Qwen and GLM series (ranging from 4B to 32B parameters) across mathematical, medical, and agentic domains confirm our hypothesis. EAFT consistently matches the downstream performance of standard SFT while significantly mitigating the degradation of general capabilities.

Problem

Research questions and friction points this paper is trying to address.

catastrophic forgetting

supervised fine-tuning

distributional gap

confident conflicts

entropy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-Adaptive Fine-Tuning

Confident Conflicts

Catastrophic Forgetting