The Medical Metaphors Corpus (MCC)

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing metaphor detection research primarily focuses on general domains, lacking dedicated resources for scientific domains such as medicine and biology—hindering computational understanding of metaphors in scientific discourse. Method: We introduce MCC (Metaphor Corpus for Science), the first annotated corpus specifically designed for scientific metaphor analysis, comprising 792 instances drawn from peer-reviewed literature, news articles, and social media. MCC features fine-grained conceptual metaphor annotations, including explicit source–target domain mappings and continuous metaphor strength scores ranging from 0 to 7. A dual-track annotation protocol (binary judgment + strength rating) and a scalable annotation framework ensure cross-source consistency. Results: Experiments reveal that state-of-the-art large language models perform significantly below human-level accuracy on scientific metaphor identification, underscoring MCC’s value as a benchmark for advancing domain-specific metaphor understanding technologies.

Technology Category

Application Category

📝 Abstract
Metaphor is a fundamental cognitive mechanism that shapes scientific understanding, enabling the communication of complex concepts while potentially constraining paradigmatic thinking. Despite the prevalence of figurative language in scientific discourse, existing metaphor detection resources primarily focus on general-domain text, leaving a critical gap for domain-specific applications. In this paper, we present the Medical Metaphors Corpus (MCC), a comprehensive dataset of 792 annotated scientific conceptual metaphors spanning medical and biological domains. MCC aggregates metaphorical expressions from diverse sources including peer-reviewed literature, news media, social media discourse, and crowdsourced contributions, providing both binary and graded metaphoricity judgments validated through human annotation. Each instance includes source-target conceptual mappings and perceived metaphoricity scores on a 0-7 scale, establishing the first annotated resource for computational scientific metaphor research. Our evaluation demonstrates that state-of-the-art language models achieve modest performance on scientific metaphor detection, revealing substantial room for improvement in domain-specific figurative language understanding. MCC enables multiple research applications including metaphor detection benchmarking, quality-aware generation systems, and patient-centered communication tools.
Problem

Research questions and friction points this paper is trying to address.

Lack of domain-specific metaphor detection resources in science
Need for annotated dataset to study medical and biological metaphors
Improving computational understanding of scientific figurative language
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive annotated medical metaphor dataset
Diverse sources with human-validated metaphoricity scores
First resource for computational scientific metaphor research
🔎 Similar Papers
No similar papers found.