IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the challenge that language models struggle to comprehend idioms, whose meanings cannot be inferred from their literal constituents. To this end, the authors introduce IdioLink—the first cross-surface semantic retrieval benchmark specifically designed for idioms, comprising 10,700 documents and 2,140 queries, each paired with either literal or paraphrased expressions of idiomatic meaning. Core semantic spans are annotated to enable fine-grained evaluation. Experiments with strong embedding models—including BGE, E5, Contriever, and Qwen—reveal limited performance in capturing the deep semantic equivalence between idiomatic and literal expressions, highlighting a tendency to rely on superficial lexical cues rather than genuine semantic understanding. This study thus establishes a new benchmark and analytical framework for advancing cross-surface semantic comprehension in natural language processing.
📝 Abstract
Idioms pose a fundamental challenge for language models, as their meaning cannot be inferred from surface form alone. Understanding such expressions, therefore, requires semantic abstraction beyond lexical overlap. We introduce IdioLink, a retrieval benchmark designed to test whether models can link idiomatic expressions to conceptually equivalent meanings expressed in literal or paraphrased forms. IdioLink comprises 10,700 documents and 2,140 queries, spanning 107 idioms with both literal and figurative uses. Each document and query is annotated with spans that convey the core meaning. Evaluating strong embedding baselines (e.g., BGE, E5, Contriever, and Qwen), we show that current models struggle to retrieve equivalent meanings across divergent surface realizations, relying instead on topical and shallow semantic cues. IdioLink exposes key gaps in idiom-aware semantic retrieval and provides a challenging testbed for future models.
Problem

Research questions and friction points this paper is trying to address.

idioms
semantic retrieval
literal expressions
figurative language
meaning abstraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

idiom-aware retrieval
semantic abstraction
cross-form meaning equivalence
retrieval benchmark
figurative language understanding
🔎 Similar Papers
K
Kai Golan Hashiloni
Data Science Institute, Reichman University, Herzliya, Israel; Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
D
Daniel Fadlon
Data Science Institute, Reichman University, Herzliya, Israel; Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
L
Lior Livyatan
Data Science Institute, Reichman University, Herzliya, Israel; Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
O
Ofri Hefetz
Data Science Institute, Reichman University, Herzliya, Israel; Efi Arazi School of Computer Science, Reichman University, Herzliya, Israel
Jiahuan Pei
Jiahuan Pei
Assistant professor at Vrije Universiteit Amsterdam (VU Amsterdam)
Dialogue SystemsNatural Language ProcessingInformation RetrievalMachine LearningOpen Science
Kfir Bar
Kfir Bar
Efi Arazi School of Computer Science, Reichman University (IDC Herzliya)
Natural Language Processing