Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In clinical studies with limited samples, existing biomedical AI models struggle to effectively integrate prior biological knowledge graphs, hindering both performance and interpretability. This work proposes the Graph-in-Graph (GiG) framework, which models each patient as a modular graph wherein edges are structured by multi-source biological knowledge graphs—such as pathways and regulatory networks—and nodes are characterized by patient-specific omics data. Knowledge-guided representation learning is achieved through a knowledge-graph-modulated graph neural network. Evaluated across nearly 9,700 patients and five clinical tasks, GiG significantly outperforms current methods, with especially pronounced gains in low-data regimes; for instance, it achieves up to a 49-percentage-point improvement in macro-F1 on prostate cancer diagnosis.
📝 Abstract
Biological systems are governed by structured molecular interactions, where pathways, regulatory circuits, and functional gene relationships shape cellular behavior and disease progression. Much of this knowledge is naturally represented as graphs. However, most biomedical AI models cannot directly use graph-encoded biological knowledge and instead require compressed low-dimensional representations, which can lose important structure and reduce performance, especially in limited-sample clinical studies. Here, we introduce Graph-in-Graph (GiG), a knowledge graph-modulated deep learning framework for data-efficient clinical prediction. GiG represents each patient as a standalone modular graph, in which curated biological knowledge graphs define edges and patient-specific measurements, such as gene expression, define node features. This design allows multiple biological knowledge graphs to be integrated while preserving gene-gene interactions and pathway topology during patient-level representation learning. Across cohorts comprising nearly 9,700 patients and five clinical tasks, including liquid biopsy cancer detection, prostate cancer diagnosis, and 32-class pan-cancer classification, GiG consistently outperforms traditional and state-of-the-art methods, with the largest gains in limited-sample settings. On the challenging prostate cancer diagnosis task, GiG improves macro-F1 by up to 49 percentage points relative to competing methods. Control experiments replacing real pathway graphs with random topologies confirm that these gains arise from biologically grounded knowledge graph structure rather than graph modeling alone. These findings show that knowledge graph-modulated deep learning can improve robustness, interpretability, and sample efficiency in clinical data analysis, and provide a principled framework for integrating biological knowledge graphs into predictive modeling.
Problem

Research questions and friction points this paper is trying to address.

Knowledge Graph
Limited-Sample Clinical Data
Biological Networks
Deep Learning
Graph Representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge graph
graph neural networks
limited-sample learning
clinical prediction
biological pathways