Modelling Child Learning and Parsing of Long-range Syntactic Dependencies

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of modeling long-distance syntactic dependencies—such as object wh-questions—in child language acquisition. We propose the first probabilistic generative model that jointly learns lexical semantics and language-specific syntax from authentic child-directed speech annotated with logical-form semantics. Methodologically, we adopt a semantics-constrained syntactic parsing framework wherein logical forms serve as an intermediate representation, enabling simultaneous lexical semantic induction, syntactic structure inference, and semantic reconstruction. Unlike conventional context-free models, ours captures hyper-context-free dependencies with high precision on complex constructions like object wh-questions. Key contributions include: (i) the first joint probabilistic model of lexicon and syntax trained on real child-directed data; (ii) a theoretical advance extending formal grammar expressivity beyond context-free limitations; and (iii) empirical evidence that surface strings alone suffice for reliable semantic reconstruction—establishing a novel computational paradigm for language acquisition.

Technology Category

Application Category

📝 Abstract
This work develops a probabilistic child language acquisition model to learn a range of linguistic phenonmena, most notably long-range syntactic dependencies of the sort found in object wh-questions, among other constructions. The model is trained on a corpus of real child-directed speech, where each utterance is paired with a logical form as a meaning representation. It then learns both word meanings and language-specific syntax simultaneously. After training, the model can deduce the correct parse tree and word meanings for a given utterance-meaning pair, and can infer the meaning if given only the utterance. The successful modelling of long-range dependencies is theoretically important because it exploits aspects of the model that are, in general, trans-context-free.
Problem

Research questions and friction points this paper is trying to address.

Develops a probabilistic model for child language acquisition.
Focuses on learning long-range syntactic dependencies in language.
Uses real child-directed speech to train and infer language structures.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic model for child language acquisition
Training on child-directed speech with logical forms
Simultaneous learning of word meanings and syntax