🤖 AI Summary
This work investigates how dynamic coupling between non-trainable variables (e.g., inputs) and trainable parameters (weights, biases) in artificial neural networks induces criticality—even when input data are non-critical. We establish a rigorous duality framework between data space and parameter tangent space, modeling the forward/backward propagation as a nonlinear composite map and linearizing it at learning equilibrium to yield a weakly coupled one-dimensional dynamical system. Theoretically, we prove that parameter fluctuations obey power-law statistics; further, we identify that activation and loss functions jointly govern the critical exponent, enabling quantitative control of criticality through their joint tuning alone. This work introduces the paradigm of “controllable critical learning”: criticality emerges spontaneously during standard supervised learning—without requiring critical input data or architectural constraints—thereby revealing criticality as an intrinsic, tunable property of gradient-based optimization dynamics.
📝 Abstract
In artificial neural networks, the activation dynamics of non-trainable variables is strongly coupled to the learning dynamics of trainable variables. During the activation pass, the boundary neurons (e.g., input neurons) are mapped to the bulk neurons (e.g., hidden neurons), and during the learning pass, both bulk and boundary neurons are mapped to changes in trainable variables (e.g., weights and biases). For example, in feed-forward neural networks, forward propagation is the activation pass and backward propagation is the learning pass. We show that a composition of the two maps establishes a duality map between a subspace of non-trainable boundary variables (e.g., dataset) and a tangent subspace of trainable variables (i.e., learning). In general, the dataset-learning duality is a complex non-linear map between high-dimensional spaces, but in a learning equilibrium, the problem can be linearized and reduced to many weakly coupled one-dimensional problems. We use the duality to study the emergence of criticality, or the power-law distributions of fluctuations of the trainable variables. In particular, we show that criticality can emerge in the learning system even from the dataset in a non-critical state, and that the power-law distribution can be modified by changing either the activation function or the loss function.