Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work studies the generalization error of models with frozen intermediate-layer representations and trainable readout layers—such as echo state networks (ESNs) and deep random feature models. Methodologically, it leverages random matrix theory and high-dimensional statistical analysis to derive exact asymptotic expressions for the generalization error in the high-dimensional limit. The analysis reveals that recurrent architectures like ESNs are mathematically equivalent to ridge regression with an exponentially time-decaying regularization kernel applied to the input covariance matrix, thereby characterizing their inductive bias toward recent inputs. This framework provides a unified analytical tool for generalization in overparameterized recurrent models. Empirically, theoretical predictions align closely with experiments: ESNs significantly outperform standard ridge regression in small-sample or short-memory tasks, whereas ridge regression excels in large-data regimes or when long-range dependencies dominate.

Technology Category

Application Category

📝 Abstract
We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer. This setting encompasses a range of architectures, from deep random-feature models to echo-state networks (ESNs) with recurrent dynamics. Working in the high-dimensional regime, we apply Random Matrix Theory to derive a closed-form expression for the asymptotic generalization error. We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance. Surprisingly, we show that a linear ESN is equivalent to ridge regression with an exponentially time-weighted (''memory'') input covariance, revealing a clear inductive bias toward recent inputs. Experiments match predictions: ESNs win in low-sample, short-memory regimes, while ridge prevails with more data or long-range dependencies. Our methodology provides a general framework for analyzing overparameterized models and offers insights into the behavior of deep learning networks.
Problem

Research questions and friction points this paper is trying to address.

Analyzing generalization error in fixed-representation models using Random Matrix Theory
Deriving closed-form expressions for recurrent networks' asymptotic performance
Comparing echo-state networks and ridge regression under different memory regimes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Random Matrix Theory for generalization error analysis
Closed-form expression for recurrent network performance
Linear ESN equivalence to time-weighted ridge regression
🔎 Similar Papers
No similar papers found.
Y
Yessin Moakher
Huawei Noah’s Ark Lab, Huawei Technologies, Paris, France; École Polytechnique, France
M
Malik Tiomoko
Huawei Noah’s Ark Lab, Huawei Technologies, Paris, France
Cosme Louart
Cosme Louart
Assistant Professor, Chinese University of Hong Kong, Shenzhen
Random matricesConcentration of the measureMachine learning
Z
Zhenyu Liao
Huazhong University of Science and Technology, Wuhan, China