🤖 AI Summary
This study challenges the prevailing assumption that complex neural architectures are inherently superior for syntactic learning, investigating whether minimalist architectures can achieve strong grammatical competence.
Method: We propose a lightweight, gradient-free large-scale Echo State Network (ESN) that scales only via expanded hidden-state dimensionality—without backpropagation or structural elaboration—and train it on a 100-million-word corpus.
Contribution/Results: Our ESN matches or surpasses comparably sized Transformer baselines on established syntactic acceptability benchmarks (CoLA, BLiMP). This constitutes the first empirical demonstration that reservoir computing is scalable to billion-word-scale language modeling while retaining robust syntactic generalization capability. The results establish an upper bound on the grammatical learnability of low-complexity recurrent architectures and open a new pathway toward efficient, interpretable language models grounded in principled dynamical systems principles.
📝 Abstract
What is a neural model with minimum architectural complexity that exhibits reasonable language learning capability? To explore such a simple but sufficient neural language model, we revisit a basic reservoir computing (RC) model, Echo State Network (ESN), a restricted class of simple Recurrent Neural Networks. Our experiments showed that ESN with a large hidden state is comparable or superior to Transformer in grammaticality judgment tasks when trained with about 100M words, suggesting that architectures as complex as that of Transformer may not always be necessary for syntactic learning.