🤖 AI Summary
This study investigates whether large language models (LLMs) encode Minimalist Program phase structures—such as phase boundaries and internal cohesion—that are not captured by Universal Dependencies (UD). By designing wh-movement stimulus sentences that hold UD distance constant and employing structural probing, activation patching, and cross-model comparative analysis, the work provides the first evidence that LLMs spontaneously acquire formal syntactic abstractions beyond UD representations. Across 13 mainstream LLMs, the authors observe a significant gradient effect in phase counts and an asymmetry in symbolic predictions of phase-internal cohesion. Activation patching confirms that 12 of these models exhibit causally active internal representations, thereby challenging the prevailing view that UD-based probes define the upper bound of syntactic representation in neural models.
📝 Abstract
Structural probes train on Universal Dependencies (UD), which does not encode formal-syntactic abstractions such as phase boundaries or phase-internal cohesion. Whether large language models (LLMs) encode these remains an open question that UD-based probing cannot answer by construction. We evaluate structural probes on wh-movement stimuli where UD distances are invariant across conditions by design -- any non-zero effect therefore reflects structure beyond UD. The three conditions -- bare small clause, infinitival, and finite -- are ordered by the number of Minimalist Program (MP) phase boundaries the wh-element crosses.
Across 13 LLMs from four families, we find a phase-count gradient on a cross-clause pair (12/13 models) and a 13/13 sign asymmetry on a within-clause pair whose UD distance is identical across conditions -- the latter specifically predicted by phase-internal cohesion, an MP abstraction invisible to UD by construction. Activation patching confirms the representations are causally active in 12/13 models. These findings suggest that distributional pretraining can induce representations aligned with formal-syntactic abstractions beyond the reach of annotation-based probing; UD-grounded probes provide a lower bound on syntactic encoding, not an upper bound.