Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study investigates large language models’ (LLMs) capacity to learn and generalize form-meaning mappings as defined by construction grammar, particularly exposing their systematic deficits in abstract reasoning—from concrete to schematic constructions—in natural language inference (NLI). Method: We introduce ConTest-NLI, the first construction-driven, large-scale NLI benchmark (80K instances), covering eight English constructions. It employs template-based generation augmented with model-in-the-loop filtering to synthesize adversarial, schematic examples. Contribution/Results: While state-of-the-art LLMs achieve 88% accuracy on natural data, performance drops sharply—down to 64%—on adversarial and schematic constructions. Construction-aware fine-tuning yields up to 9% absolute improvement, demonstrating both the necessity and feasibility of explicit constructional modeling. This work establishes the first scalable, construction-grammar–oriented evaluation framework for LLMs, revealing fundamental limitations in their understanding of abstract syntactic structure.

Technology Category

Application Category

📝 Abstract

We probe large language models' ability to learn deep form-meaning mappings as defined by construction grammars. We introduce the ConTest-NLI benchmark of 80k sentences covering eight English constructions from highly lexicalized to highly schematic. Our pipeline generates diverse synthetic NLI triples via templating and the application of a model-in-the-loop filter. This provides aspects of human validation to ensure challenge and label reliability. Zero-shot tests on leading LLMs reveal a 24% drop in accuracy between naturalistic (88%) and adversarial data (64%), with schematic patterns proving hardest. Fine-tuning on a subset of ConTest-NLI yields up to 9% improvement, yet our results highlight persistent abstraction gaps in current LLMs and offer a scalable framework for evaluating construction-informed learning.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to learn deep form-meaning construction grammar mappings

Testing generalization gap between naturalistic and adversarial construction data

Developing scalable framework to assess construction-informed learning in language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Construction-based NLI fine-tuning pipeline

Synthetic NLI triples with model-in-the-loop filter

Scalable framework for evaluating construction learning

🔎 Similar Papers

No similar papers found.