Gradients Must Earn Their Influence: Unifying SFT with Generalized Entropic Objectives

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Standard supervised fine-tuning (SFT) suffers from sensitivity to noisy labels and an inability to effectively sharpen predictions when the model is highly confident, leading to an imbalance between plasticity and stability. This work proposes Dynamic Entropy Fine-Tuning (DEFT), a novel parameter-free objective that embeds SFT within the generalized deformed logarithm family. By revealing a “gating × error” structure in the gradients and leveraging Rényi-2 entropy as a proxy for model confidence, DEFT employs a Cayley transform to map uncertainty into a continuous focusing trajectory, enabling adaptive modulation between old and new knowledge. Experiments demonstrate that DEFT significantly improves performance across multiple tasks, effectively mitigating overfitting to label noise and learning stagnation, thereby achieving more robust fine-tuning.

Technology Category

Application Category

📝 Abstract

Standard negative log-likelihood (NLL) for Supervised Fine-Tuning (SFT) applies uniform token-level weighting. This rigidity creates a two-fold failure mode: (i) overemphasizing low-probability targets can amplify gradients on noisy supervision and disrupt robust priors, and (ii) uniform weighting provides weak sharpening when the model is already confident. Existing methods fail to resolve the resulting plasticity--stability dilemma, often suppressing necessary learning signals alongside harmful ones. To address this issue, we unify token-level SFT objectives within a generalized deformed-log family and expose a universal gate $\times$ error gradient structure, where the gate controls how much the model trusts its current prediction. By employing the Cayley transform, we map the model's continuously evolving uncertainty onto a continuous focus trajectory, which enables seamless interpolation between scenarios involving uncertain novel concepts and those involving well-established knowledge. We then introduce Dynamic Entropy Fine-Tuning (DEFT), a parameter-free objective that modulates the trust gate using distribution concentration (R\'enyi-2 entropy) as a practical proxy for the model's predictive state. Extensive experiments and analyses demonstrate that DEFT achieves a better balance between exploration and exploitation, leading to improved overall performance.

Problem

Research questions and friction points this paper is trying to address.

Supervised Fine-Tuning

gradient weighting

plasticity-stability dilemma

token-level optimization

model confidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Entropy Fine-Tuning

Generalized deformed-log

Trust gating