Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work studies the generalization error of models with frozen intermediate-layer representations and trainable readout layers—such as echo state networks (ESNs) and deep random feature models. Methodologically, it leverages random matrix theory and high-dimensional statistical analysis to derive exact asymptotic expressions for the generalization error in the high-dimensional limit. The analysis reveals that recurrent architectures like ESNs are mathematically equivalent to ridge regression with an exponentially time-decaying regularization kernel applied to the input covariance matrix, thereby characterizing their inductive bias toward recent inputs. This framework provides a unified analytical tool for generalization in overparameterized recurrent models. Empirically, theoretical predictions align closely with experiments: ESNs significantly outperform standard ridge regression in small-sample or short-memory tasks, whereas ridge regression excels in large-data regimes or when long-range dependencies dominate.

Technology Category

Application Category

📝 Abstract

We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer. This setting encompasses a range of architectures, from deep random-feature models to echo-state networks (ESNs) with recurrent dynamics. Working in the high-dimensional regime, we apply Random Matrix Theory to derive a closed-form expression for the asymptotic generalization error. We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance. Surprisingly, we show that a linear ESN is equivalent to ridge regression with an exponentially time-weighted (''memory'') input covariance, revealing a clear inductive bias toward recent inputs. Experiments match predictions: ESNs win in low-sample, short-memory regimes, while ridge prevails with more data or long-range dependencies. Our methodology provides a general framework for analyzing overparameterized models and offers insights into the behavior of deep learning networks.

Problem

Research questions and friction points this paper is trying to address.

Analyzing generalization error in fixed-representation models using Random Matrix Theory

Deriving closed-form expressions for recurrent networks' asymptotic performance

Comparing echo-state networks and ridge regression under different memory regimes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random Matrix Theory for generalization error analysis

Closed-form expression for recurrent network performance

Linear ESN equivalence to time-weighted ridge regression

🔎 Similar Papers

No similar papers found.