🤖 AI Summary
Conventional gradient descent optimization overly relies on purely exploitative dynamics, limiting exploration during training. Method: We identify that significantly increasing the learning rate drives neural network training dynamics into a critical regime near the edge of chaos—characterized by a balanced exploration-exploitation trade-off, a positive maximal Lyapunov exponent, and sensitive dependence on initial conditions. Using dynamical systems modeling, Lyapunov spectrum analysis, and quantitative sensitivity measures, we rigorously characterize transient chaos in supervised learning. Contribution/Results: We demonstrate, for the first time, the constructive role of transient chaos in accelerating convergence without sacrificing generalization. Empirical validation across multiple tasks (e.g., MNIST) and architectures shows this critical regime minimizes time-to-target accuracy while maintaining or improving test performance. The work establishes a novel paradigm—“accelerating training at the edge of chaos”—and confirms its robustness across tasks, network architectures, and hyperparameters.
📝 Abstract
Traditional algorithms to optimize artificial neural networks when confronted with a supervised learning task are usually exploitation-type relaxational dynamics such as gradient descent (GD). Here, we explore the dynamics of the neural network trajectory along training for unconventionally large learning rates. We show that for a region of values of the learning rate, the GD optimization shifts away from purely exploitation-like algorithm into a regime of exploration-exploitation balance, as the neural network is still capable of learning but the trajectory shows sensitive dependence on initial conditions -- as characterized by positive network maximum Lyapunov exponent --. Interestingly, the characteristic training time required to reach an acceptable accuracy in the test set reaches a minimum precisely in such learning rate region, further suggesting that one can accelerate the training of artificial neural networks by locating at the onset of chaos. Our results -- initially illustrated for the MNIST classification task -- qualitatively hold for a range of supervised learning tasks, learning architectures and other hyperparameters, and showcase the emergent, constructive role of transient chaotic dynamics in the training of artificial neural networks.