Text-Attributed Knowledge Graph Enrichment with Large Language Models for Medical Concept Representation

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This work addresses the limitations in learning representations of medical concepts from electronic health records (EHRs), which stem from missing cross-type dependencies and difficulties in integrating clinical text semantics. To overcome these challenges, the authors propose a method for constructing a medical knowledge graph that fuses EHR-derived statistical associations with reasoning capabilities of large language models (LLMs). By prompting LLMs to generate node descriptions and edge rationales, the graph is enriched into a text-attributed structure. A LoRA-finetuned LLaMA text encoder and a heterogeneous graph neural network are jointly trained to produce unified embeddings that capture both textual and structural information. This study is the first to leverage LLMs for inferring cross-type medical semantic relationships and generating interpretable textual attributes, significantly improving clinical prediction performance on MIMIC-III/IV. The resulting embeddings serve as plug-and-play concept encoders that effectively enhance standard EHR analysis pipelines.

Technology Category

Application Category

📝 Abstract

In electronic health record (EHR) mining, learning high-quality representations of medical concepts (e.g., standardized diagnosis, medication, and procedure codes) is fundamental for downstream clinical prediction. However, robust concept representation learning is hindered by two key challenges: (i) clinically important cross-type dependencies (e.g., diagnosis-medication and medication-procedure relations) are often missing or incomplete in existing ontology resources, limiting the ability to model complex EHR patterns; and (ii) rich clinical semantics are often missing from structured resources, and even when available as text, are difficult to integrate with KG structure for representation learning. To address these challenges, we present CoMed, an LLM-empowered graph learning framework for medical concept representation. CoMed first builds a global knowledge graph (KG) over medical codes by combining statistically reliable associations mined from EHRs with type-constrained LLM prompting to infer semantic relations. It then utilizes LLMs to enrich the KG into a text-attributed graph by generating node descriptions and edge rationales, providing semantic signals for both concepts and their relationships. Finally, CoMed jointly trains a LoRA-tuned LLaMA text encoder with a heterogeneous GNN, fusing text semantics and graph structure into unified concept embeddings. Extensive experiments on MIMIC-III and MIMIC-IV show that CoMed consistently improves prediction performance and serves as an effective plug-in concept encoder for standard EHR pipelines.

Problem

Research questions and friction points this paper is trying to address.

medical concept representation

knowledge graph enrichment

cross-type dependencies

clinical semantics

electronic health records

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-Attributed Knowledge Graph

Large Language Models

Medical Concept Representation