For Better or For Worse? Learning Minimum Variance Features With Label Augmentation

📅 2024-02-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Label augmentation techniques (e.g., label smoothing, Mixup) are widely used for regularization, yet their impact on learned feature representations remains poorly understood. Method: We theoretically analyze how label augmentation shapes feature structure—deriving that, under convex settings, it implicitly steers linear models toward minimum-variance features; for nonlinear models, we establish a lower bound linking the augmented loss to output variance, revealing an intrinsic bias toward low-variance representations. Empirically, we validate that this bias improves generalization but simultaneously amplifies sensitivity to spurious low-variance correlations in training data, exposing a fundamental trade-off between robustness and vulnerability to dataset biases. Contribution/Results: This work provides the first variance-centric unification of label augmentation’s regularizing mechanism, yielding interpretable theoretical principles and diagnostic tools for designing safer data augmentations and assessing their associated risks.

Technology Category

Application Category

📝 Abstract

Data augmentation has been pivotal in successfully training deep learning models on classification tasks over the past decade. An important subclass of data augmentation techniques - which includes both label smoothing and Mixup - involves modifying not only the input data but also the input label during model training. In this work, we analyze the role played by the label augmentation aspect of such methods. We first prove that linear models on binary classification data trained with label augmentation learn only the minimum variance features in the data, while standard training (which includes weight decay) can learn higher variance features. We then use our techniques to show that even for nonlinear models and general data distributions, the label smoothing and Mixup losses are lower bounded by a function of the model output variance. Lastly, we demonstrate empirically that this aspect of label smoothing and Mixup can be a positive and a negative. On the one hand, we show that the strong performance of label smoothing and Mixup on image classification benchmarks is correlated with learning low variance hidden representations. On the other hand, we show that Mixup and label smoothing can be more susceptible to low variance spurious correlations in the training data.

Problem

Research questions and friction points this paper is trying to address.

Analyzes label augmentation in deep learning

Explores minimum variance features in models

Evaluates label smoothing and Mixup effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Label augmentation for variance reduction

Analyzing Mixup and label smoothing effects

Minimum variance feature learning

🔎 Similar Papers

No similar papers found.