🤖 AI Summary
This work addresses the challenge of modeling long-distance syntactic dependencies—such as object wh-questions—in child language acquisition. We propose the first probabilistic generative model that jointly learns lexical semantics and language-specific syntax from authentic child-directed speech annotated with logical-form semantics. Methodologically, we adopt a semantics-constrained syntactic parsing framework wherein logical forms serve as an intermediate representation, enabling simultaneous lexical semantic induction, syntactic structure inference, and semantic reconstruction. Unlike conventional context-free models, ours captures hyper-context-free dependencies with high precision on complex constructions like object wh-questions. Key contributions include: (i) the first joint probabilistic model of lexicon and syntax trained on real child-directed data; (ii) a theoretical advance extending formal grammar expressivity beyond context-free limitations; and (iii) empirical evidence that surface strings alone suffice for reliable semantic reconstruction—establishing a novel computational paradigm for language acquisition.
📝 Abstract
This work develops a probabilistic child language acquisition model to learn a range of linguistic phenonmena, most notably long-range syntactic dependencies of the sort found in object wh-questions, among other constructions. The model is trained on a corpus of real child-directed speech, where each utterance is paired with a logical form as a meaning representation. It then learns both word meanings and language-specific syntax simultaneously. After training, the model can deduce the correct parse tree and word meanings for a given utterance-meaning pair, and can infer the meaning if given only the utterance. The successful modelling of long-range dependencies is theoretically important because it exploits aspects of the model that are, in general, trans-context-free.