Revisiting a Pain in the Neck: A Semantic Reasoning Benchmark for Language Models

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

161K/year
🤖 AI Summary
Current language models lack systematic evaluation on their ability to understand multiword expressions—such as idioms, noun compounds, and verb constructions—that involve deep semantic processing. This work proposes SemanticQA, a benchmark suite that, for the first time, unifies disparate multiword expression resources into a structured semantic reasoning framework encompassing four task types: extraction, classification, interpretation, and composition. Designed to support comprehensive evaluation across diverse model architectures and scales, SemanticQA enables rigorous assessment of semantic competence. Experimental results reveal significant deficiencies in existing models’ capacity to handle non-literal meanings and complex syntactic-semantic structures, thereby offering both empirical evidence and a foundational benchmark for advancing language models’ semantic reasoning capabilities.

Technology Category

Application Category

📝 Abstract
We present SemanticQA, an evaluation suite designed to assess language models (LMs) in semantic phrase processing tasks. The benchmark consolidates existing multiword expression (MwE) resources and reorganizes them into a unified testbed. It covers both general lexical phenomena, such as lexical collocations, and three fine-grained categories: idiomatic expressions, noun compounds, and verbal constructions. Through SemanticQA, we assess LMs of diverse architectures and scales in extraction, classification, and interpretation tasks, as well as sequential task compositions. We reveal substantial performance variation, particularly on tasks requiring semantic reasoning, highlighting differences in reasoning efficacy and semantic understanding of LMs, providing insights for pushing LMs with stronger comprehension on non-trivial semantic phrases. The evaluation harness and data of SemanticQA are available at https://github.com/jacklanda/SemanticQA.
Problem

Research questions and friction points this paper is trying to address.

semantic reasoning
multiword expressions
language models
idiomatic expressions
noun compounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

SemanticQA
multiword expressions
semantic reasoning
language model evaluation
idiomatic expressions