Cross Entropy versus Label Smoothing: A Neural Collapse Perspective

📅 2024-02-06

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 0

career value

150K/year

🤖 AI Summary

This work uncovers the intrinsic mechanism by which label smoothing mitigates overfitting, from the perspective of neural collapse (NC). Addressing the feature degeneracy and miscalibration induced by standard cross-entropy loss, we establish, for the first time, a causal link between label smoothing and NC strength: we theoretically prove that its global minimizers yield features with lower condition numbers, thereby accelerating the coupled convergence of NC1 (within-class variability collapse) and NC2 (between-class variability expansion). Leveraging an unconstrained feature model, we derive closed-form solutions and corroborate our claims via NC metrics and optimization dynamics analysis. Experiments demonstrate that label smoothing consistently improves model calibration, generalization, and robustness. Our core contribution is the first attribution of label smoothing’s generalization benefits to its targeted enhancement of neural collapse and associated feature condition-number optimization.

Technology Category

Application Category

📝 Abstract

Label smoothing loss is a widely adopted technique to mitigate overfitting in deep neural networks. This paper studies label smoothing from the perspective of Neural Collapse (NC), a powerful empirical and theoretical framework which characterizes model behavior during the terminal phase of training. We first show empirically that models trained with label smoothing converge faster to neural collapse solutions and attain a stronger level of neural collapse. Additionally, we show that at the same level of NC1, models under label smoothing loss exhibit intensified NC2. These findings provide valuable insights into the performance benefits and enhanced model calibration under label smoothing loss. We then leverage the unconstrained feature model to derive closed-form solutions for the global minimizers for both loss functions and further demonstrate that models under label smoothing have a lower conditioning number and, therefore, theoretically converge faster. Our study, combining empirical evidence and theoretical results, not only provides nuanced insights into the differences between label smoothing and cross-entropy losses, but also serves as an example of how the powerful neural collapse framework can be used to improve our understanding of DNNs.

Problem

Research questions and friction points this paper is trying to address.

Compares label smoothing and cross-entropy via neural collapse framework

Analyzes faster convergence and stronger neural collapse with label smoothing

Derives closed-form solutions for global minimizers of both loss functions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Label smoothing accelerates neural collapse convergence

Label smoothing intensifies neural collapse characteristics

Label smoothing reduces conditioning number for faster convergence

🔎 Similar Papers

No similar papers found.