LLM-Empowered Class Imbalanced Graph Prompt Learning for Online Drug Trafficking Detection

πŸ“… 2025-02-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the critical challenges of extreme class imbalance and scarce labeled samples in detecting illicit drug transactions on online platforms, this paper proposes LLM-HetGDTβ€”a novel framework integrating Large Language Models (LLMs) with Heterogeneous Graph Neural Networks (HGNNs). To mitigate graph-level imbalance, it introduces the first LLM-driven synthetic minority-user-node generation method. It further designs a learnable soft-prompt mechanism tailored for heterogeneous graphs to enhance modeling of discriminative patterns specific to minority classes. Finally, it employs contrastive pre-training coupled with soft-prompt fine-tuning to improve generalization. Evaluated on our newly constructed Twitter-HetDrug dataset, LLM-HetGDT achieves a 12.6% absolute improvement in F1-score and a 19.3% gain in minority-class recall over state-of-the-art methods, demonstrating both effectiveness and practical applicability.

Technology Category

Application Category

πŸ“ Abstract
As the market for illicit drugs remains extremely profitable, major online platforms have become direct-to-consumer intermediaries for illicit drug trafficking participants. These online activities raise significant social concerns that require immediate actions. Existing approaches to combating this challenge are generally impractical, due to the imbalance of classes and scarcity of labeled samples in real-world applications. To this end, we propose a novel Large Language Model-empowered Heterogeneous Graph Prompt Learning framework for illicit Drug Trafficking detection, called LLM-HetGDT, that leverages LLM to facilitate heterogeneous graph neural networks (HGNNs) to effectively identify drug trafficking activities in the class-imbalanced scenarios. Specifically, we first pre-train HGNN over a contrastive pretext task to capture the inherent node and structure information over the unlabeled drug trafficking heterogeneous graph (HG). Afterward, we employ LLM to augment the HG by generating high-quality synthetic user nodes in minority classes. Then, we fine-tune the soft prompts on the augmented HG to capture the important information in the minority classes for the downstream drug trafficking detection task. To comprehensively study online illicit drug trafficking activities, we collect a new HG dataset over Twitter, called Twitter-HetDrug. Extensive experiments on this dataset demonstrate the effectiveness, efficiency, and applicability of LLM-HetGDT.
Problem

Research questions and friction points this paper is trying to address.

Detects online drug trafficking using LLM-enhanced graph learning.
Addresses class imbalance and scarce labeled data in detection.
Improves detection with synthetic data and fine-tuned prompts.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLM to enhance heterogeneous graph neural networks
Generates synthetic user nodes for minority class augmentation
Fine-tunes soft prompts on augmented graph for detection
πŸ”Ž Similar Papers
No similar papers found.
T
Tianyi Ma
University of Notre Dame, Indiana, USA
Yiyue Qian
Yiyue Qian
Amazon
graph representation learningLLMmulti-modal learning
Z
Zehong Wang
University of Notre Dame, Indiana, USA
Z
Zheyuan Zhang
University of Notre Dame, Indiana, USA
Chuxu Zhang
Chuxu Zhang
Associate Professor of CSE, University of Connecticut (UConn)
Machine LearningDeep LearningData Mining
Y
Yanfang Ye
University of Notre Dame, Indiana, USA