Bearing Syntactic Fruit with Stack-Augmented Neural Networks

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates whether neural networks can intrinsically prefer hierarchical syntactic generalizations—mimicking human-like language acquisition—without syntactic supervision, large-scale pretraining, or extended training. Method: We propose a nondeterministic stack-augmented neural architecture that enhances the stack RNN mechanism and integrate it with mainstream backbones including Transformer and LSTM. Contribution/Results: On the classic question formation task, the nondeterministic stack-augmented Transformer achieves substantial gains over baselines, attaining—for the first time under zero syntactic annotations, zero pretraining, and short training—hierarchical inductive biases comparable to those observed in human children. Our results demonstrate that explicit hierarchical memory mechanisms are critical for improving grammatical generalization capacity in neural models. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Any finite set of training data is consistent with an infinite number of hypothetical algorithms that could have generated it. Studies have shown that when human children learn language, they consistently favor hypotheses based on hierarchical syntactic rules without ever encountering disambiguating examples. A recent line of work has inquired as to whether common neural network architectures share this bias, finding that they do so only under special conditions: when syntactically supervised, when pre-trained on massive corpora, or when trained long past convergence. In this paper, we demonstrate, for the first time, neural network architectures that are able to generalize in human-like fashion without any of the aforementioned requirements: stack-augmented neural networks. We test three base architectures (transformer, simple RNN, LSTM) augmented with two styles of stack: the superposition stack of Joulin&Mikolov (2015) and a nondeterministic generalization of it proposed by DuSell&Chiang (2023). We find that transformers with nondeterministic stacks generalize best out of these architectures on a classical question formation task. We also propose a modification to the stack RNN architecture that improves hierarchical generalization. These results suggest that stack-augmented neural networks may be more accurate models of human language acquisition than standard architectures, serving as useful objects of psycholinguistic study. Our code is publicly available.

Problem

Research questions and friction points this paper is trying to address.

Neural networks struggle with hierarchical syntactic generalization without special conditions.

Standard architectures require supervision or massive data for human-like language learning.

Stack-augmented networks enable syntactic generalization without external requirements.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stack-augmented neural networks enable human-like generalization

Nondeterministic stacks enhance transformer performance significantly

Modified stack RNN architecture improves hierarchical generalization capability

🔎 Similar Papers

No similar papers found.

Authors to Follow