LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

📅 2023-05-08

🏛️ Annual Meeting of the Association for Computational Linguistics

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Traditional label smoothing (LS) uniformly distributes smoothing probability across all non-target classes, ignoring sample heterogeneity and thereby limiting generalization. To address this, we propose Instance-Adaptive Label Regularization (IALR), the first LS formulation cast as a differentiable bilevel optimization problem: the upper level optimizes generalization performance, while the lower level analytically computes the optimal instance-level smoothing distribution. Crucially, IALR avoids caching intermediate parameters and enables efficient backpropagation via gradient approximation—ensuring both interpretability and computational efficiency. Extensive experiments across seven machine translation and three image classification benchmarks demonstrate that IALR consistently improves model generalization, robustness, and training stability, outperforming standard LS and multiple strong baselines.

📝 Abstract

Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Conventional LS, however, regardless of the training instance assumes that each non-target class is equally likely. In this work, we present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. Based on this formulation, we propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem. We derive a deterministic and interpretable solution of the inner loop as the optimal label smoothing without the need to store the parameters or the output of a trained model. Finally, we conduct extensive experiments and demonstrate our LABO consistently yields improvement over conventional label regularization on various fields, including seven machine translation and three image classification tasks across various

Problem

Research questions and friction points this paper is trying to address.

Learning optimal label regularization via bi-level optimization

Improving generalization and training efficiency in deep learning

Developing instance-specific label smoothing beyond conventional methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-level optimization for label regularization

Instance-specific label smoothing variants

Deterministic interpretable optimal solution

🔎 Similar Papers

No similar papers found.