🤖 AI Summary
This study investigates how syntactic structure is hierarchically constructed across layers in neural language models (e.g., BERT): whether fine-grained structures (e.g., subject noun phrases) emerge early in lower layers, while coarse-grained dependencies (e.g., verb–direct dependent relations) are integrated only in higher layers. To this end, we propose the **Derivation-Motivated Probe**, a novel structured probing method that explicitly models syntax as a bottom-up derivational process—marking the first such approach. Experiments show that fine-grained syntactic representations are robustly decodable in low layers, whereas coarse-grained dependencies progressively emerge and stabilize in middle-to-high layers. Crucially, selecting the optimal layer for global syntactic integration improves accuracy on syntax-sensitive tasks—e.g., subject–verb number agreement—by up to 2.3%. Our work reveals the hierarchical evolution of syntactic abstraction in transformer-based models and establishes a new paradigm for enhancing model interpretability and syntax-awareness.
📝 Abstract
Recent work has demonstrated that neural language models encode syntactic structures in their internal representations, yet the derivations by which these structures are constructed across layers remain poorly understood. In this paper, we propose Derivational Probing to investigate how micro-syntactic structures (e.g., subject noun phrases) and macro-syntactic structures (e.g., the relationship between the root verbs and their direct dependents) are constructed as word embeddings propagate upward across layers. Our experiments on BERT reveal a clear bottom-up derivation: micro-syntactic structures emerge in lower layers and are gradually integrated into a coherent macro-syntactic structure in higher layers. Furthermore, a targeted evaluation on subject-verb number agreement shows that the timing of constructing macro-syntactic structures is critical for downstream performance, suggesting an optimal timing for integrating global syntactic information.