The Implicit Bias of Logit Regularization

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work investigates the intrinsic mechanisms by which logit regularization—such as label smoothing—enhances model calibration and generalization. Through theoretical analysis of convex penalties in logit space within linear classification, we uncover an implicit bias that induces logits to cluster around sample-specific targets. We establish, for the first time, that this clustering behavior aligns the weight vectors precisely with the Fisher linear discriminant direction. Under a signal-plus-noise model, this alignment substantially reduces sample complexity, triggers grokking phenomena, and improves noise-robust generalization. In the small-noise regime, the method achieves a halving of sample complexity while ensuring stable generalization, thereby offering a deeper understanding of the foundational principles underlying logit regularization.

Technology Category

Application Category

📝 Abstract

Logit regularization, the addition a convex penalty directly in logit space, is widely used in modern classifiers, with label smoothing as a prominent example. While such methods often improve calibration and generalization, their mechanism remains under-explored. In this work, we analyze a general class of such logit regularizers in the context of linear classification, and demonstrate that they induce an implicit bias of logit clustering around finite per-sample targets. For Gaussian data, or whenever logits are sufficiently clustered, we prove that logit clustering drives the weight vector to align exactly with Fisher's Linear Discriminant. To demonstrate the consequences, we study a simple signal-plus-noise model in which this transition has dramatic effects: Logit regularization halves the critical sample complexity and induces grokking in the small-noise limit, while making generalization robust to noise. Our results extend the theoretical understanding of label smoothing and highlight the efficacy of a broader class of logit-regularization methods.

Problem

Research questions and friction points this paper is trying to address.

logit regularization

implicit bias

label smoothing

generalization

calibration

Innovation

Methods, ideas, or system contributions that make the work stand out.

logit regularization

implicit bias

logit clustering