Deep networks learn to parse uniform-depth context-free languages from local statistics

📅 2026-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how linguistic structure can be learned solely from sentences, focusing on the statistical properties and sample complexity underlying parsing capabilities. To this end, we introduce a family of probabilistic context-free grammars (PCFGs) with controllable ambiguity and cross-scale dependencies, along with a hierarchical inference algorithm inspired by deep convolutional networks. Our theoretical analysis establishes formal connections among linguistic statistics, data quantity, and learnability. Empirical evaluations across architectures—including deep convolutional networks and Transformers—demonstrate that cross-scale correlations substantially enhance parsing performance, suggesting that hierarchical linguistic representations can naturally emerge from local statistical regularities in the data.

Technology Category

Application Category

📝 Abstract
Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) support their ability to parse text when predicting the next word, while representing semantic notions independently of surface form. Yet, which data statistics make these feats possible, and how much data is required, remain largely unknown. Probabilistic context-free grammars (PCFGs) provide a tractable testbed for studying these questions. However, prior work has focused either on the post-hoc characterization of the parsing-like algorithms used by trained networks; or on the learnability of PCFGs with fixed syntax, where parsing is unnecessary. Here, we (i) introduce a tunable class of PCFGs in which both the degree of ambiguity and the correlation structure across scales can be controlled; (ii) provide a learning mechanism -- an inference algorithm inspired by the structure of deep convolutional networks -- that links learnability and sample complexity to specific language statistics; and (iii) validate our predictions empirically across deep convolutional and transformer-based architectures. Overall, we propose a unifying framework where correlations at different scales lift local ambiguities, enabling the emergence of hierarchical representations of the data.
Problem

Research questions and friction points this paper is trying to address.

context-free languages
language learning
data statistics
sample complexity
hierarchical representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

probabilistic context-free grammars
deep convolutional networks
hierarchical representations
sample complexity
multi-scale correlations
🔎 Similar Papers
No similar papers found.
J
Jack T. Parley
Institute of Physics, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
F
Francesco Cagnetta
Theoretical and Scientific Data Science, SISSA, Trieste, Italy
Matthieu Wyart
Matthieu Wyart
Professor of Physics, Johns Hopkins
physics