🤖 AI Summary
This work investigates the formation mechanism of implicit bias in machine learning optimization—specifically, how optimization algorithms prefer certain solutions among multiple feasible ones. By integrating the continuous symmetries of model parameterizations with the stochasticity inherent in optimization, the study provides the first unified explanation of implicit bias as a geometric correction induced by learning dynamics in the associated quotient space. Leveraging tools from differential geometry, stochastic differential equations, and Lie group theory, the authors develop a general framework that enables both forward prediction and inverse design of implicit biases. The approach accurately predicts and precisely controls diverse forms of implicit bias—such as sparsity and spectral properties—across various architectures, with numerical experiments showing excellent agreement with theoretical predictions.
📝 Abstract
A central problem in machine learning theory is to characterize how learning dynamics select particular solutions among the many compatible with the training objective, a phenomenon, called implicit bias, which remains only partially characterized. In the present work, we identify a general mechanism, in terms of an explicit geometric correction of the learning dynamics, for the emergence of implicit biases, arising from the interaction between continuous symmetries in the model's parametrization and stochasticity in the optimization process. Our viewpoint is constructive in two complementary directions: given model symmetries, one can derive the implicit bias they induce; conversely, one can inverse-design a wide class of different implicit biases by computing specific redundant parameterizations. More precisely, we show that, when the dynamics is expressed in the quotient space obtained by factoring out the symmetry group of the parameterization, the resulting stochastic differential equation gains a closed form geometric correction in the stationary distribution of the optimizer dynamics favoring orbits with small local volume. We compute the resulting symmetry induced bias for a range of architectures, showing how several well known results fit into a single unified framework. The approach also provides a practical methodology for deriving implicit biases in new settings, and it yields concrete, testable predictions that we confirm by numerical simulations on toy models trained on synthetic data, leaving more complex scenarios for future work. Finally, we test the implicit bias inverse-design procedure in notable cases, including biases toward sparsity in linear features or in spectral properties of the model parameters.