LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

📅 2023-05-08
🏛️ Annual Meeting of the Association for Computational Linguistics
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional label smoothing (LS) uniformly distributes smoothing probability across all non-target classes, ignoring sample heterogeneity and thereby limiting generalization. To address this, we propose Instance-Adaptive Label Regularization (IALR), the first LS formulation cast as a differentiable bilevel optimization problem: the upper level optimizes generalization performance, while the lower level analytically computes the optimal instance-level smoothing distribution. Crucially, IALR avoids caching intermediate parameters and enables efficient backpropagation via gradient approximation—ensuring both interpretability and computational efficiency. Extensive experiments across seven machine translation and three image classification benchmarks demonstrate that IALR consistently improves model generalization, robustness, and training stability, outperforming standard LS and multiple strong baselines.
📝 Abstract
Regularization techniques are crucial to improving the generalization performance and training efficiency of deep neural networks. Many deep learning algorithms rely on weight decay, dropout, batch/layer normalization to converge faster and generalize. Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks. Conventional LS, however, regardless of the training instance assumes that each non-target class is equally likely. In this work, we present a general framework for training with label regularization, which includes conventional LS but can also model instance-specific variants. Based on this formulation, we propose an efficient way of learning LAbel regularization by devising a Bi-level Optimization (LABO) problem. We derive a deterministic and interpretable solution of the inner loop as the optimal label smoothing without the need to store the parameters or the output of a trained model. Finally, we conduct extensive experiments and demonstrate our LABO consistently yields improvement over conventional label regularization on various fields, including seven machine translation and three image classification tasks across various
Problem

Research questions and friction points this paper is trying to address.

Learning optimal label regularization via bi-level optimization
Improving generalization and training efficiency in deep learning
Developing instance-specific label smoothing beyond conventional methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-level optimization for label regularization
Instance-specific label smoothing variants
Deterministic interpretable optimal solution
🔎 Similar Papers
No similar papers found.
P
Peng Lu
Department of Computer Science and Operations Research, Université de Montréal
Ahmad Rashid
Ahmad Rashid
Vector Institute; University of Waterloo
Machine LearningNatural Language ProcessingLingustics
I
I. Kobyzev
Huawei Noah’s Ark Lab, Canada
Mehdi Rezagholizadeh
Mehdi Rezagholizadeh
Principal Research Scientist, Advanced Micro Devices (AMD)
Efficient AINLP/Computer VisionDeep Learning
P
P. Langlais
Department of Computer Science and Operations Research, Université de Montréal