🤖 AI Summary
To address hierarchical concept retrieval failure caused by out-of-vocabulary (OOV) terms in large-scale biomedical ontologies such as SNOMED CT, this paper proposes a language model–driven ontology embedding method that jointly encodes concept textual descriptions and hierarchical structural priors to produce generalizable and interpretable concept vector representations. We further design a path-aware similarity metric enabling precise matching of OOV queries to their most direct hypernyms, hyponyms, and ancestral nodes. Our contributions are threefold: (1) the first manually annotated OOV benchmark dataset specifically for SNOMED CT; (2) an embedding framework that jointly optimizes semantic coherence and hierarchical plausibility; and (3) empirical results demonstrating significant improvements over strong baselines—including SBERT and lexical matching—achieving a +12.7% gain in Mean Reciprocal Rank (MRR) on real-world queries, thereby validating cross-ontology applicability and robustness.
📝 Abstract
SNOMED CT is a biomedical ontology with a hierarchical representation of large-scale concepts. Knowledge retrieval in SNOMED CT is critical for its application, but often proves challenging due to language ambiguity, synonyms, polysemies and so on. This problem is exacerbated when the queries are out-of-vocabulary (OOV), i.e., having no equivalent matchings in the ontology. In this work, we focus on the problem of hierarchical concept retrieval from SNOMED CT with OOV queries, and propose an approach based on language model-based ontology embeddings. For evaluation, we construct OOV queries annotated against SNOMED CT concepts, testing the retrieval of the most direct subsumers and their less relevant ancestors. We find that our method outperforms the baselines including SBERT and two lexical matching methods. While evaluated against SNOMED CT, the approach is generalisable and can be extended to other ontologies. We release code, tools, and evaluation datasets at https://github.com/jonathondilworth/HR-OOV.