Democratizing Large Language Model-Based Graph Data Augmentation via Latent Knowledge Graphs

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing graph data augmentation methods often neglect contextual information, while mainstream large language model (LLM)-empowered graph learning approaches rely on white-box access—requiring model weights or intermediate features—making them incompatible with proprietary, black-box LLMs. To address this, we propose DemoGraph, the first black-box, context-driven graph data augmentation framework. DemoGraph requires no internal model parameters; instead, it leverages granularity-aware prompting and instruction tuning to guide closed-source LLMs in generating implicit knowledge graphs, which are then dynamically fused into the original graph structure. This approach effectively mitigates graph data scarcity and noise. Extensive experiments demonstrate significant performance gains across diverse graph learning tasks—including node classification, link prediction, and graph classification—with particularly notable improvements in electronic health record (EHR) analytics, where it simultaneously enhances both predictive accuracy and model interpretability.

Technology Category

Application Category

📝 Abstract
Data augmentation is necessary for graph representation learning due to the scarcity and noise present in graph data. Most of the existing augmentation methods overlook the context information inherited from the dataset as they rely solely on the graph structure for augmentation. Despite the success of some large language model-based (LLM) graph learning methods, they are mostly white-box which require access to the weights or latent features from the open-access LLMs, making them difficult to be democratized for everyone as existing LLMs are mostly closed-source for commercial considerations. To overcome these limitations, we propose a black-box context-driven graph data augmentation approach, with the guidance of LLMs -- DemoGraph. Leveraging the text prompt as context-related information, we task the LLM with generating knowledge graphs (KGs), which allow us to capture the structural interactions from the text outputs. We then design a dynamic merging schema to stochastically integrate the LLM-generated KGs into the original graph during training. To control the sparsity of the augmented graph, we further devise a granularity-aware prompting strategy and an instruction fine-tuning module, which seamlessly generates text prompts according to different granularity levels of the dataset. Extensive experiments on various graph learning tasks validate the effectiveness of our method over existing graph data augmentation methods. Notably, our approach excels in scenarios involving electronic health records (EHRs), which validates its maximal utilization of contextual knowledge, leading to enhanced predictive performance and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Addresses graph data scarcity and noise issues
Proposes black-box context-driven graph augmentation
Enhances predictive performance and interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Black-box LLM-driven graph augmentation
Dynamic merging of knowledge graphs
Granularity-aware prompting strategy
🔎 Similar Papers
No similar papers found.