🤖 AI Summary
Traditional label smoothing (LS) uniformly distributes smoothing probability across all non-target classes, ignoring sample heterogeneity and thereby limiting generalization. To address this, we propose Instance-Adaptive Label Regularization (IALR), the first LS formulation cast as a differentiable bilevel optimization problem: the upper level optimizes generalization performance, while the lower level analytically computes the optimal instance-level smoothing distribution. Crucially, IALR avoids caching intermediate parameters and enables efficient backpropagation via gradient approximation—ensuring both interpretability and computational efficiency. Extensive experiments across seven machine translation and three image classification benchmarks demonstrate that IALR consistently improves model generalization, robustness, and training stability, outperforming standard LS and multiple strong baselines.
📝 Abstract
Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Conventional LS, however, regardless of the training instance assumes that each non-target class is equally likely. In this work, we present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. Based on this formulation, we propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem. We derive a deterministic and interpretable solution of the inner loop as the optimal label smoothing without the need to store the parameters or the output of a trained model. Finally, we conduct extensive experiments and demonstrate our LABO consistently yields improvement over conventional label regularization on various fields, including seven machine translation and three image classification tasks across various