π€ AI Summary
Biomedical knowledge graphs (KGs) suffer from incompleteness, hindering drug discovery and clinical decision support. While large language models (LLMs) can extract relations, their outputs are often non-standardized and lack ontology alignment, impeding direct integration into KGs. To address this, we propose the first ontology-constrained, three-stage relation extraction framework: (1) semantic retrieval via SapBERT-enhanced predicate embeddings; (2) LLM-based contextual re-ranking of candidate relations; and (3) explicit modeling of negated assertionsβa novel contribution. Evaluated on ChemProt, our method achieves 52% exact match accuracy and 94% Top-10 recall. Applied to 2,400 HEAL abstracts, it filters 99.6% spurious associations while accurately identifying negated relations. This significantly improves both semantic fidelity and ontology compatibility in KG completion.
π Abstract
Biomedical knowledge graphs (KGs) are vital for drug discovery and clinical decision support but remain incomplete. Large language models (LLMs) excel at extracting biomedical relations, yet their outputs lack standardization and alignment with ontologies, limiting KG integration. We introduce RELATE, a three-stage pipeline that maps LLM-extracted relations to standardized ontology predicates using ChemProt and the Biolink Model. The pipeline includes: (1) ontology preprocessing with predicate embeddings, (2) similarity-based retrieval enhanced with SapBERT, and (3) LLM-based reranking with explicit negation handling. This approach transforms relation extraction from free-text outputs to structured, ontology-constrained representations. On the ChemProt benchmark, RELATE achieves 52% exact match and 94% accuracy@10, and in 2,400 HEAL Project abstracts, it effectively rejects irrelevant associations (0.4%) and identifies negated assertions. RELATE captures nuanced biomedical relationships while ensuring quality for KG augmentation. By combining vector search with contextual LLM reasoning, RELATE provides a scalable, semantically accurate framework for converting unstructured biomedical literature into standardized KGs.