Large Language Models as Neurolinguistic Subjects: Discrepancy in Performance and Competence for Form and Meaning

📅 2024-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit a cognition–performance dissociation in representing linguistic form (signifier) versus meaning (signified), challenging assumptions about their linguistic competence. Method: We propose the first neuro-linguistic evaluation paradigm for LLMs, moving beyond traditional psycholinguistic approaches. We construct bilingual minimal-pair datasets (COMPS-ZH/DE) in Chinese and German, and employ diagnostic neural probing alongside cross-layer activation pattern analysis to systematically characterize how form and meaning are encoded across hidden layers. Contribution/Results: We identify a pervasive “performance–capability dissociation”: instruction tuning improves task performance without enhancing deep semantic representation; form representations are robust and cross-lingually consistent, whereas meaning representations remain weak; and model output probabilities do not reliably reflect underlying linguistic competence. Our work establishes a novel theoretical framework and empirical benchmark for assessing LLMs’ true language capabilities.

Technology Category

Application Category

📝 Abstract
This study investigates the linguistic understanding of Large Language Models (LLMs) regarding signifier (form) and signified (meaning) by distinguishing two LLM assessment paradigms: psycholinguistic and neurolinguistic. Traditional psycholinguistic evaluations often reflect statistical rules that may not accurately represent LLMs' true linguistic competence. We introduce a neurolinguistic approach, utilizing a novel method that combines minimal pair and diagnostic probing to analyze activation patterns across model layers. This method allows for a detailed examination of how LLMs represent form and meaning, and whether these representations are consistent across languages. We found: (1) Psycholinguistic and neurolinguistic methods reveal that language performance and competence are distinct; (2) Direct probability measurement may not accurately assess linguistic competence; (3) Instruction tuning won't change much competence but improve performance; (4) LLMs exhibit higher competence and performance in form compared to meaning. Additionally, we introduce new conceptual minimal pair datasets for Chinese (COMPS-ZH) and German (COMPS-DE), complementing existing English datasets.
Problem

Research questions and friction points this paper is trying to address.

Assess LLMs' linguistic form and meaning understanding
Compare psycholinguistic and neurolinguistic evaluation paradigms
Analyze LLMs' competence and performance across languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neurolinguistic approach
Minimal pair probing
Model layer analysis
🔎 Similar Papers
No similar papers found.