Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address hierarchical concept retrieval failure caused by out-of-vocabulary (OOV) terms in large-scale biomedical ontologies such as SNOMED CT, this paper proposes a language model–driven ontology embedding method that jointly encodes concept textual descriptions and hierarchical structural priors to produce generalizable and interpretable concept vector representations. We further design a path-aware similarity metric enabling precise matching of OOV queries to their most direct hypernyms, hyponyms, and ancestral nodes. Our contributions are threefold: (1) the first manually annotated OOV benchmark dataset specifically for SNOMED CT; (2) an embedding framework that jointly optimizes semantic coherence and hierarchical plausibility; and (3) empirical results demonstrating significant improvements over strong baselines—including SBERT and lexical matching—achieving a +12.7% gain in Mean Reciprocal Rank (MRR) on real-world queries, thereby validating cross-ontology applicability and robustness.

Technology Category

Application Category

📝 Abstract
SNOMED CT is a biomedical ontology with a hierarchical representation of large-scale concepts. Knowledge retrieval in SNOMED CT is critical for its application, but often proves challenging due to language ambiguity, synonyms, polysemies and so on. This problem is exacerbated when the queries are out-of-vocabulary (OOV), i.e., having no equivalent matchings in the ontology. In this work, we focus on the problem of hierarchical concept retrieval from SNOMED CT with OOV queries, and propose an approach based on language model-based ontology embeddings. For evaluation, we construct OOV queries annotated against SNOMED CT concepts, testing the retrieval of the most direct subsumers and their less relevant ancestors. We find that our method outperforms the baselines including SBERT and two lexical matching methods. While evaluated against SNOMED CT, the approach is generalisable and can be extended to other ontologies. We release code, tools, and evaluation datasets at https://github.com/jonathondilworth/HR-OOV.
Problem

Research questions and friction points this paper is trying to address.

Retrieving hierarchical concepts from SNOMED CT using out-of-vocabulary queries
Addressing language ambiguity and synonym challenges in biomedical ontology retrieval
Developing ontology embedding methods for unmatched query concept matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses language model-based ontology embeddings
Retrieves hierarchical concepts from SNOMED CT
Handles out-of-vocabulary biomedical queries effectively
🔎 Similar Papers
No similar papers found.
J
Jonathon Dilworth
University of Manchester, Manchester, UK
H
Hui Yang
University of Manchester, Manchester, UK
Jiaoyan Chen
Jiaoyan Chen
Department of Computer Science, University of Manchester
Knowledge GraphOntologyMachine LearningLarge Language Model
Y
Yongsheng Gao
SNOMED International, London, UK