🤖 AI Summary
This work studies the generalization error of models with frozen intermediate-layer representations and trainable readout layers—such as echo state networks (ESNs) and deep random feature models. Methodologically, it leverages random matrix theory and high-dimensional statistical analysis to derive exact asymptotic expressions for the generalization error in the high-dimensional limit. The analysis reveals that recurrent architectures like ESNs are mathematically equivalent to ridge regression with an exponentially time-decaying regularization kernel applied to the input covariance matrix, thereby characterizing their inductive bias toward recent inputs. This framework provides a unified analytical tool for generalization in overparameterized recurrent models. Empirically, theoretical predictions align closely with experiments: ESNs significantly outperform standard ridge regression in small-sample or short-memory tasks, whereas ridge regression excels in large-data regimes or when long-range dependencies dominate.
📝 Abstract
We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer. This setting encompasses a range of architectures, from deep random-feature models to echo-state networks (ESNs) with recurrent dynamics. Working in the high-dimensional regime, we apply Random Matrix Theory to derive a closed-form expression for the asymptotic generalization error. We then apply this analysis to recurrent representations and obtain concise formula that characterize their performance. Surprisingly, we show that a linear ESN is equivalent to ridge regression with an exponentially time-weighted (''memory'') input covariance, revealing a clear inductive bias toward recent inputs. Experiments match predictions: ESNs win in low-sample, short-memory regimes, while ridge prevails with more data or long-range dependencies. Our methodology provides a general framework for analyzing overparameterized models and offers insights into the behavior of deep learning networks.