๐ค AI Summary
This work reveals an intrinsic frequency preference in first-order optimizers, which critically influences optimization trajectories and generalization performance. To address this, we propose Natural Spectrum Fusion (NSF): a method that models optimizers as spectral controllers and dynamically reweights and fuses low-frequency (stability-promoting) and high-frequency (detail-capturing) components via a *p*-exponent cyclical scheduling schemeโwithout modifying the model, data, or incurring additional computational overhead. NSF introduces the first generalized second-moment term supporting both positive and negative *p* exponents, enabling controllable cross-band spectral fusion. Across multiple benchmarks, NSF significantly reduces test error using identical learning rates and fixed hyperparameters; on certain tasks, it achieves baseline accuracy with only 25% of the training cost. Empirical results demonstrate faster convergence and superior generalization, establishing NSF as a principled, lightweight spectral enhancement for first-order optimization.
๐ Abstract
Spectral behaviors have been widely discussed in machine learning, yet the optimizer's own spectral bias remains unclear. We argue that first-order optimizers exhibit an intrinsic frequency preference that significantly reshapes the optimization path. To address this, we propose Natural Spectral Fusion (NSF): reframing training as controllable spectral coverage and information fusion rather than merely scaling step sizes. NSF has two core principles: treating the optimizer as a spectral controller that dynamically balances low- and high-frequency information; and periodically reweighting frequency bands at negligible cost, without modifying the model, data, or training pipeline. We realize NSF via a p-exponent extension of the second-moment term, enabling both positive and negative exponents, and implement it through cyclic scheduling. Theory and experiments show that adaptive methods emphasize low frequencies, SGD is near-neutral, and negative exponents amplify high-frequency information. Cyclic scheduling broadens spectral coverage, improves cross-band fusion, and induces early decision-boundary alignment, where accuracy improves even while loss remains high. Across multiple benchmarks, with identical learning-rate strategies and fixed hyperparameters, p-exponent cyclic scheduling consistently reduces test error and demonstrates distinct convergence behavior; on some tasks, it matches baseline accuracy with only one-quarter of the training cost. Overall, NSF reveals the optimizer's role as an active spectral controller and provides a unified, controllable, and efficient framework for first-order optimization.