Construction Identification and Disambiguation Using BERT: A Case Study of NPN

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

135K/year

🤖 AI Summary

This study investigates whether BERT implicitly encodes syntactic and semantic knowledge of English noun-preposition-noun (NPN) constructions. Method: We introduce the first fine-grained semantically annotated NPN benchmark dataset and design construction-specific probing classifiers to systematically evaluate BERT’s ability in construction identification and polysemy disambiguation. We further conduct embedding analyses, human-designed word-order perturbation experiments, and controlled probing tests. Contribution/Results: We provide the first empirical evidence that BERT accurately identifies NPN constructions and discriminates among their polysemous semantic types; however, performance drops sharply under word-order perturbations, demonstrating strong dependence on surface-form integrity. This work constitutes the first systematic validation of implicit constructional grammar modeling in pretrained language models, offering novel methodology and critical evidence for the intersection of construction grammar and representation learning.

Technology Category

Application Category

📝 Abstract

Construction Grammar hypothesizes that knowledge of a language consists chiefly of knowledge of form-meaning pairs (''constructions'') that include vocabulary, general grammar rules, and even idiosyncratic patterns. Recent work has shown that transformer language models represent at least some constructional patterns, including ones where the construction is rare overall. In this work, we probe BERT's representation of the form and meaning of a minor construction of English, the NPN (noun-preposition-noun) construction -- exhibited in such expressions as face to face and day to day -- which is known to be polysemous. We construct a benchmark dataset of semantically annotated corpus instances (including distractors that superficially resemble the construction). With this dataset, we train and evaluate probing classifiers. They achieve decent discrimination of the construction from distractors, as well as sense disambiguation among true instances of the construction, revealing that BERT embeddings carry indications of the construction's semantics. Moreover, artificially permuting the word order of true construction instances causes them to be rejected, indicating sensitivity to matters of form. We conclude that BERT does latently encode at least some knowledge of the NPN construction going beyond a surface syntactic pattern and lexical cues.

Problem

Research questions and friction points this paper is trying to address.

Identify NPN constructions using BERT embeddings

Disambiguate polysemous meanings in NPN constructions

Assess BERT's sensitivity to form and semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT probes NPN construction semantics

Benchmark dataset with semantic annotations

Probing classifiers achieve decent disambiguation

🔎 Similar Papers

GEIC: Universal and Multilingual Named Entity Recognition with Large Language Models