RELATE: Relation Extraction in Biomedical Abstracts with LLMs and Ontology Constraints

πŸ“… 2025-09-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Biomedical knowledge graphs (KGs) suffer from incompleteness, hindering drug discovery and clinical decision support. While large language models (LLMs) can extract relations, their outputs are often non-standardized and lack ontology alignment, impeding direct integration into KGs. To address this, we propose the first ontology-constrained, three-stage relation extraction framework: (1) semantic retrieval via SapBERT-enhanced predicate embeddings; (2) LLM-based contextual re-ranking of candidate relations; and (3) explicit modeling of negated assertionsβ€”a novel contribution. Evaluated on ChemProt, our method achieves 52% exact match accuracy and 94% Top-10 recall. Applied to 2,400 HEAL abstracts, it filters 99.6% spurious associations while accurately identifying negated relations. This significantly improves both semantic fidelity and ontology compatibility in KG completion.

Technology Category

Application Category

πŸ“ Abstract
Biomedical knowledge graphs (KGs) are vital for drug discovery and clinical decision support but remain incomplete. Large language models (LLMs) excel at extracting biomedical relations, yet their outputs lack standardization and alignment with ontologies, limiting KG integration. We introduce RELATE, a three-stage pipeline that maps LLM-extracted relations to standardized ontology predicates using ChemProt and the Biolink Model. The pipeline includes: (1) ontology preprocessing with predicate embeddings, (2) similarity-based retrieval enhanced with SapBERT, and (3) LLM-based reranking with explicit negation handling. This approach transforms relation extraction from free-text outputs to structured, ontology-constrained representations. On the ChemProt benchmark, RELATE achieves 52% exact match and 94% accuracy@10, and in 2,400 HEAL Project abstracts, it effectively rejects irrelevant associations (0.4%) and identifies negated assertions. RELATE captures nuanced biomedical relationships while ensuring quality for KG augmentation. By combining vector search with contextual LLM reasoning, RELATE provides a scalable, semantically accurate framework for converting unstructured biomedical literature into standardized KGs.
Problem

Research questions and friction points this paper is trying to address.

Extracting biomedical relations from abstracts lacks standardization for knowledge graph integration
LLM outputs require alignment with ontologies to ensure structured representation
Converting unstructured biomedical literature into standardized knowledge graphs needs scalable framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maps LLM-extracted relations to ontology predicates
Uses similarity retrieval enhanced with SapBERT embeddings
Employs LLM-based reranking with negation handling
πŸ”Ž Similar Papers
No similar papers found.
O
Olawumi Olasunkanmi
Department of Computer Science, University of North Carolina, Chapel Hill, United States
M
Mathew Satursky
Renaissance Computing Institute, United States
H
Hong Yi
Renaissance Computing Institute, United States
Chris Bizon
Chris Bizon
Director of Analytics and Data Science, RENCI, University of North Carolina
InformaticsNext Generation SequencingDrug DiscoveryFluid Dynamics
Harlin Lee
Harlin Lee
School of Data Science and Society, University of North Carolina at Chapel Hill
graphsmanifoldsoptimal transportnon-convex optimizationhealthcare
S
Stanley Ahalt
School of Data Science and Society, University of North Carolina, Chapel Hill, United States