🤖 AI Summary
This work investigates how gradient-based meta-learning fosters transferable shared representations via nonlinear bilevel neural networks within a teacher–student framework, and elucidates the mechanistic link between representation formation and few-shot generalization to novel tasks. Methodologically, it innovatively applies statistical physics mean-field theory to characterize the macroscopic meta-training dynamics—yielding the first quantitative analysis of representation emergence conditions and their sensitivity to hyperparameters (e.g., inner-loop step count, learning rate). A streaming-task meta-training framework is proposed, establishing asymptotic representation convergence paths and generalization error bounds under the teacher–student setting. The theoretical findings are empirically validated, bridging a critical theoretical gap in understanding representation dynamics during meta-learning. Collectively, this work provides a principled foundation for hyperparameter design and generic knowledge modeling in meta-learning.
📝 Abstract
Gradient-based meta-learning algorithms have gained popularity for their ability to train models on new tasks using limited data. Empirical observations indicate that such algorithms are able to learn a shared representation across tasks, which is regarded as a key factor in their success. However, the in-depth theoretical understanding of the learning dynamics and the origin of the shared representation remains underdeveloped. In this work, we investigate the meta-learning dynamics of nonlinear two-layer neural networks trained on streaming tasks in the teacher-student scenario. Through the lens of statistical physics analysis, we characterize the macroscopic behavior of the meta-training processes, the formation of the shared representation, and the generalization ability of the model on new tasks. The analysis also points to the importance of the choice of certain hyperparameters of the learning algorithms.