Probing Syntax in Large Language Models: Successes and Remaining Challenges

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the confounding effects arising from uncontrolled corpora in evaluating structural probes’ syntactic sensitivity. We systematically assess probe interpretability through three controlled syntactic benchmarks—word-distance manipulation, deep dependency structure, and interfering linguistic phenomena—combined with linear probing and syntactic relation analysis. Results reveal core probe biases: strong sensitivity to surface-level word distance, poor detection of long-range dependencies and deep syntactic structures, and vulnerability to noun interference and non-canonical verb forms; yet probes remain robust to lexical predictability. Our primary contribution is the first systematic identification and empirical validation of inherent limitations in structural probes, establishing a fine-grained, controlled evaluation paradigm for syntactic representations. This framework provides a methodological foundation for reliable measurement of syntactic competence in large language models.

Technology Category

Application Category

📝 Abstract
The syntactic structures of sentences can be readily read-out from the activations of large language models (LLMs). However, the ``structural probes'' that have been developed to reveal this phenomenon are typically evaluated on an indiscriminate set of sentences. Consequently, it remains unclear whether structural and/or statistical factors systematically affect these syntactic representations. To address this issue, we conduct an in-depth analysis of structural probes on three controlled benchmarks. Our results are three-fold. First, structural probes are biased by a superficial property: the closer two words are in a sentence, the more likely structural probes will consider them as syntactically linked. Second, structural probes are challenged by linguistic properties: they poorly represent deep syntactic structures, and get interfered by interacting nouns or ungrammatical verb forms. Third, structural probes do not appear to be affected by the predictability of individual words. Overall, this work sheds light on the current challenges faced by structural probes. Providing a benchmark made of controlled stimuli to better evaluate their performance.
Problem

Research questions and friction points this paper is trying to address.

Structural probes' bias towards word proximity in syntax analysis
Challenges in representing deep syntactic structures accurately
Impact of linguistic properties on structural probe performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing structural probes on controlled benchmarks
Identifying biases in syntactic link predictions
Evaluating linguistic challenges in deep syntax
P
Pablo J. Diego-Simón
LSCP, ENS, PSL, EHESS, CNRS, Paris, France
Emmanuel Chemla
Emmanuel Chemla
LSCP, ENS, Paris
Jean-Rémi King
Jean-Rémi King
Meta
neuroscienceartificial intelligencehuman cognitiondecoding
Y
Yair Lakretz
LSCP, ENS, PSL, EHESS, CNRS, Paris, France