Effective Regularization Through Loss-Function Metalearning

📅 2020-10-02

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 1

career value

186K/year

🤖 AI Summary

This work addresses the lack of adaptive regularization in neural network loss functions by proposing a meta-learning framework for loss function optimization, termed TaylorGLO. Methodologically, it integrates Taylor-expansion-driven meta-optimization, learning rule decomposition, and dynamical systems analysis. Theoretically, it establishes for the first time that this paradigm intrinsically induces a phase-wise regularization mechanism: suppressing parameter oscillations in early training, preserving gradient flow dynamical invariance during mid-training to accelerate meta-convergence, and tightening generalization bounds in late training. Experiments demonstrate significant improvements in model generalization, training speed, few-shot data efficiency, and adversarial robustness. This work introduces the first theoretically grounded paradigm for adaptive loss-function regularization in meta-learning, providing formal guarantees on both regularization behavior and meta-optimization dynamics.

📝 Abstract

Loss-function metalearning can be used to discover novel, customized loss functions for deep neural networks, resulting in improved performance, faster training, and improved data utilization. A likely explanation is that such functions discourage overfitting, leading to effective regularization. This paper theoretically demonstrates that this is indeed the case: decomposition of learning rules makes it possible to characterize the training dynamics and show that loss functions evolved through TaylorGLO regularize both in the beginning and end of learning, and maintain an invariant in between. The invariant can be utilized to make the metalearning process more efficient in practice, and the regularization can train networks that are robust against adversarial attacks. Loss-function optimization can thus be seen as a well-founded new aspect of metalearning in neural networks.

Problem

Research questions and friction points this paper is trying to address.

Evolved loss functions prevent overfitting in neural networks

TaylorGLO balances error minimization and overfitting avoidance

Evolutionary optimization enhances loss function design and robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evolutionary computation optimizes neural network architectures

TaylorGLO method discovers customized loss functions

Evolved loss functions balance error and overfitting

🔎 Similar Papers

Online Loss Function Learning