Efficient Hyperparameter Tuning via Trajectory Invariance Principle

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Large-scale hyperparameter tuning is computationally expensive and lacks rigorous theoretical foundations. Method: This paper introduces the “trajectory invariance” principle, demonstrating that coupling learning rate and weight decay induces near-identical training loss curves, gradient noise profiles, and gradient norm dynamics across diverse hyperparameter configurations—effectively collapsing the two-dimensional tuning space into a one-dimensional manifold. Contribution/Results: This principle establishes the first universal guiding principle for hyperparameter optimization, substantially reducing search dimensionality and tuning cost. It revises existing scaling laws and challenges conventional assumptions—such as independent tuning of learning rate and weight decay. Validated across multiple architectures and tasks, the principle is grounded in pretraining loss analysis, gradient noise modeling, and empirical gradient norm observation, confirming its broad applicability and practical utility.

Technology Category

Application Category

📝 Abstract

As hyperparameter tuning becomes increasingly costly at scale, efficient tuning methods are essential. Yet principles for guiding hyperparameter tuning remain limited. In this work, we seek to establish such principles by considering a broad range of hyperparameters, including batch size, learning rate, and weight decay. We identify a phenomenon we call trajectory invariance, where pre-training loss curves, gradient noise, and gradient norm exhibit invariance--closely overlapping--with respect to a quantity that combines learning rate and weight decay. This phenomenon effectively reduces the original two-dimensional hyperparameter space to one dimension, yielding an efficient tuning rule: follow the salient direction revealed by trajectory invariance. Furthermore, we refine previous scaling laws and challenge several existing viewpoints. Overall, our work proposes new principles for efficient tuning and inspires future research on scaling laws.

Problem

Research questions and friction points this paper is trying to address.

Establishes trajectory invariance principle for hyperparameter tuning efficiency

Reduces two-dimensional hyperparameter space to one dimension via invariance

Challenges existing scaling laws and proposes new tuning principles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses trajectory invariance principle for hyperparameter tuning

Reduces two-dimensional hyperparameter space to one dimension

Provides efficient tuning rule following salient direction

🔎 Similar Papers

A Trajectory-Based Bayesian Approach to Multi-Objective Hyperparameter Optimization with Epoch-Aware Trade-Offs