Probing Internal Representations of Multi-Word Verbs in Large Language Models

📅 2025-02-07

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study investigates how Transformer-based language models (e.g., BERT) represent multi-word verbs (MWVs; e.g., “give up”, “look at”), focusing on the layer-wise encoding of their lexical and syntactic properties. Using dual-granularity representational diagnostics—word-level and sentence-level probing classifiers alongside Generalized Discriminability Value (GDV) analysis—we systematically characterize MWV representation across model layers. Results show that intermediate layers yield optimal discriminability, with peak classification accuracy occurring there. GDV analysis reveals only weak linear separability, yet probing classifiers achieve high accuracy, indicating that semantic-syntactic information is encoded implicitly and non-linearly. This work provides the first systematic evidence of layer-specificity and non-linear encoding in MWV representations, offering critical computational support for usage-based theories of linguistic representation.

Technology Category

Application Category

📝 Abstract

This study investigates the internal representations of verb-particle combinations, called multi-word verbs, within transformer-based large language models (LLMs), specifically examining how these models capture lexical and syntactic properties at different neural network layers. Using the BERT architecture, we analyze the representations of its layers for two different verb-particle constructions: phrasal verbs like 'give up' and prepositional verbs like 'look at'. Our methodology includes training probing classifiers on the internal representations to classify these categories at both word and sentence levels. The results indicate that the model's middle layers achieve the highest classification accuracies. To further analyze the nature of these distinctions, we conduct a data separability test using the Generalized Discrimination Value (GDV). While GDV results show weak linear separability between the two verb types, probing classifiers still achieve high accuracy, suggesting that representations of these linguistic categories may be non-linearly separable. This aligns with previous research indicating that linguistic distinctions in neural networks are not always encoded in a linearly separable manner. These findings computationally support usage-based claims on the representation of verb-particle constructions and highlight the complex interaction between neural network architectures and linguistic structures.

Problem

Research questions and friction points this paper is trying to address.

Analyzing internal representations of multi-word verbs

Examining lexical and syntactic properties in LLMs

Investigating non-linear separability of verb categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT architecture analysis

probing classifiers training

non-linear separability testing

🔎 Similar Papers

No similar papers found.