Emergence of Phonemic, Syntactic, and Semantic Representations in Artificial Neural Networks

📅 2026-01-26
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
This study addresses the absence of a unified computational framework for explaining how neural representations of phonemes, words, and syntax emerge during language acquisition. By applying geometric analysis of neural activations and subspace modeling, the authors systematically examine the evolution of representational structure in deep learning models trained on speech and text. Their work reveals, for the first time, that artificial neural networks spontaneously develop a hierarchical sequence of linguistic representations mirroring child language acquisition—progressively forming phonemic, lexical, and syntactic subspaces. Furthermore, the study quantifies that such models require two to four orders of magnitude more data than human children to achieve comparable representational structures, offering a novel computational perspective on the mechanisms and data efficiency disparities underlying language acquisition.

Technology Category

Application Category

📝 Abstract
During language acquisition, children successively learn to categorize phonemes, identify words, and combine them with syntax to form new meaning. While the development of this behavior is well characterized, we still lack a unifying computational framework to explain its underlying neural representations. Here, we investigate whether and when phonemic, lexical, and syntactic representations emerge in the activations of artificial neural networks during their training. Our results show that both speech- and text-based models follow a sequence of learning stages: during training, their neural activations successively build subspaces, where the geometry of the neural activations represents phonemic, lexical, and syntactic structure. While this developmental trajectory qualitatively relates to children's, it is quantitatively different: These algorithms indeed require two to four orders of magnitude more data for these neural representations to emerge. Together, these results show conditions under which major stages of language acquisition spontaneously emerge, and hence delineate a promising path to understand the computations underpinning language acquisition.
Problem

Research questions and friction points this paper is trying to address.

language acquisition
neural representations
phonemic
syntactic
semantic
Innovation

Methods, ideas, or system contributions that make the work stand out.

neural representations
language acquisition
developmental trajectory
artificial neural networks
syntactic structure
🔎 Similar Papers
No similar papers found.
P
Pierre Orhan
Paris Brain Institute
P
Pablo Diego-SimĂłn
Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP), DĂ©partement d’Etudes Cognitives, École Normale SupĂ©rieure, PSL University, CNRS
E
Emmnanuel Chemla
Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP), DĂ©partement d’Etudes Cognitives, École Normale SupĂ©rieure, PSL University, CNRS
Y
Yair Lakretz
Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP), DĂ©partement d’Etudes Cognitives, École Normale SupĂ©rieure, PSL University, CNRS
Yves Boubenec
Yves Boubenec
Associate Professor, MdC ENS-Ulm, Paris
auditionauditory cortexprefrontal cortex
Jean-Rémi King
Jean-Rémi King
Meta
neuroscienceartificial intelligencehuman cognitiondecoding