Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard supervised fine-tuning (SFT) suffers from sensitivity to noisy labels and an inability to effectively sharpen predictions when the model is highly confident, leading to an imbalance between plasticity and stability. This work proposes Dynamic Entropy Fine-Tuning (DEFT), a novel parameter-free objective that embeds SFT within the generalized deformed logarithm family. By revealing a “gating × error” structure in the gradients and leveraging Rényi-2 entropy as a proxy for model confidence, DEFT employs a Cayley transform to map uncertainty into a continuous focusing trajectory, enabling adaptive modulation between old and new knowledge. Experiments demonstrate that DEFT significantly improves performance across multiple tasks, effectively mitigating overfitting to label noise and learning stagnation, thereby achieving more robust fine-tuning.

Technology Category

Application Category

📝 Abstract
Standard negative log-likelihood (NLL) for Supervised Fine-Tuning (SFT) applies uniform token-level weighting. This rigidity creates a two-fold failure mode: (i) overemphasizing low-probability targets can amplify gradients on noisy supervision and disrupt robust priors, and (ii) uniform weighting provides weak sharpening when the model is already confident. Existing methods fail to resolve the resulting plasticity--stability dilemma, often suppressing necessary learning signals alongside harmful ones. To address this issue, we unify token-level SFT objectives within a generalized deformed-log family and expose a universal gate $\times$ error gradient structure, where the gate controls how much the model trusts its current prediction. By employing the Cayley transform, we map the model's continuously evolving uncertainty onto a continuous focus trajectory, which enables seamless interpolation between scenarios involving uncertain novel concepts and those involving well-established knowledge. We then introduce Dynamic Entropy Fine-Tuning (DEFT), a parameter-free objective that modulates the trust gate using distribution concentration (R\'enyi-2 entropy) as a practical proxy for the model's predictive state. Extensive experiments and analyses demonstrate that DEFT achieves a better balance between exploration and exploitation, leading to improved overall performance.
Problem

Research questions and friction points this paper is trying to address.

Supervised Fine-Tuning
gradient weighting
plasticity-stability dilemma
token-level optimization
model confidence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Entropy Fine-Tuning
Generalized deformed-log
Trust gating
Rényi entropy
Cayley transform
🔎 Similar Papers
No similar papers found.
Zecheng Wang
Zecheng Wang
Harbin Institute of Technology
D
Deyuan Liu
Harbin Institute of Technology
C
Chunshan Li
Harbin Institute of Technology
Y
Yupeng Zhang
WeChat, Tencent
Zhengyun Zhao
Zhengyun Zhao
Tsinghua University
Large Language ModelInformation RetrievalMedical AI
D
Dianhui Chu
Harbin Institute of Technology
Bingning Wang
Bingning Wang
Baichuan Inc.
NLPQuestion AnsweringLarge language model
Dianbo Sui
Dianbo Sui
Harbin Institute of Technology