Do Language Models Encode Knowledge of Linguistic Constraint Violations?

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This study investigates whether large language models encode representations of linguistic constraint violations in their parameters and selectively activate them when processing ungrammatical sentences. To this end, we employ sparse autoencoders to decompose model activations into sparse, monosemous features and introduce an unsupervised sensitivity score to identify violation-related features. We further develop a falsification framework comprising three conjunctive criteria to systematically evaluate the selectivity and causal role of these features. Our experiments reveal that no universal violation detector consistently meets the falsification criteria across linguistic phenomena; only a subset of phenomena exhibits limited evidence of selective, causally relevant structures.

📝 Abstract

Large Language Models (LLMs) achieve strong linguistic performance, yet their internal mechanisms for producing these predictions remain unclear. We investigate the hypothesis that LLMs encode representations of linguistic constraint violations within their parameters, which are selectively activated when processing ungrammatical sentences. To test this, we use sparse autoencoders to decompose polysemantic activations into sparse, monosemantic features and recover candidates for violation-related features. We introduce a sensitivity score for identifying features that are preferentially activated on constraint-violated versus well-formed inputs, enabling unsupervised detection of potential violation-specific features. We further propose a conjunctive falsification framework with three criteria evaluated jointly. Overall, the results are negative in two respects: (1) the falsification criteria are not jointly satisfied across linguistic phenomena, and (2) no features are consistently shared across all categories. While some phenomena show partial evidence of selective causal structure, the overall pattern provides limited support for a unified set of grammatical violation detectors in current LMs.

Problem

Research questions and friction points this paper is trying to address.

linguistic constraints

language models

grammatical violations

internal representations

constraint violations

Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse autoencoders

sensitivity score

conjunctive falsification framework