🤖 AI Summary
This work addresses the limitations of existing continual graph learning research, which predominantly relies on synthetic partitions of static knowledge graphs and fails to capture the asynchronous, structured evolution inherent in real-world biomedical knowledge graphs. To bridge this gap, the authors introduce PrimeKG-CL—the first benchmark for continual graph learning grounded in realistic temporal dynamics—integrating nine authoritative databases across two time snapshots, multimodal node features, and ten entity-type-specific tasks. The benchmark further incorporates three evaluation settings—persistent, newly added, and removed entities—to comprehensively assess models’ knowledge retention and forgetting behaviors. Systematic evaluation reveals that multimodal features can boost entity-level task performance by up to 60%; among existing methods, only DistMult effectively distinguishes persistent from outdated knowledge; decoder architectures and continual learning strategies exhibit significant interaction effects; and several current approaches show limited scalability on large-scale graphs.
📝 Abstract
Biomedical knowledge graphs underwrite drug repurposing and clinical decision support, yet the upstream ontologies they depend on update on independent cycles that add millions of edges and deprecate hundreds of thousands more between releases. Yet existing continual graph learning has been studied almost exclusively on synthetic random splits of static, generic KGs, a regime that cannot reproduce the asynchronous, structured evolution real biomedical KGs undergo. To this end, we introduce PrimeKG-CL, a CGL benchmark built from nine authoritative biomedical databases (129K+ nodes, 8.1M+ edges, 10 node types, 30 relation types) with two genuine temporal snapshots (June 2021, July 2023; 5.83M edges added, 889K removed, 7.21M persistent), 10 entity-type-grouped tasks, multimodal node features, and a per-task persistent/added/removed test stratification. On three tasks (biomedical relationship prediction, entity classification, KGQA), we evaluate six CL strategies across four KGE decoders, plus LKGE, an LLM-RAG agent, and CMKL. We find that decoder choice and continual learning strategy interact strongly: no single strategy performs best across all decoders, and mismatched combinations can significantly degrade performance. Moreover, only DistMult exhibits a clear separation between persistent and deprecated knowledge, indicating that standard metrics conflate retention of still-valid facts with failure to forget outdated ones; this effect is absent under RotatE. In addition, multimodal features improve entity-level tasks by up to 60%, and a recent CKGE framework (IncDE) failed to scale to our 5.67M-triple base task across five attempts up to 350GB RAM. Data, pipeline, baselines, and the stratified split are released openly. Dataset:huggingface.co/datasets/yradwan147/PrimeKGCL|Code:github.com/yradwan147/primekg-cl-neurips2026