Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the emerging vulnerability of textual attribute graph (TAG) models to adversarial attacks that jointly exploit text and graph structure, a risk exacerbated by the absence of a unified attack framework compatible with both graph neural networks (GNNs) and large language models (LLMs). To bridge this gap, we propose BadGraph, a novel black-box attack framework that leverages an LLM to simultaneously perturb node topology and textual semantics. By integrating target influencer retrieval with graph prior knowledge, BadGraph constructs cross-modal aligned attack paths that are both stealthy and interpretable. Extensive experiments demonstrate its effectiveness, achieving up to a 76.3% performance drop across diverse TAG models, significantly outperforming existing attack strategies in terms of universality, imperceptibility, and interpretability.

Technology Category

Application Category

📝 Abstract
Text-attributed graphs (TAGs) enhance graph learning by integrating rich textual semantics and topological context for each node. While boosting expressiveness, they also expose new vulnerabilities in graph learning through text-based adversarial surfaces. Recent advances leverage diverse backbones, such as graph neural networks (GNNs) and pre-trained language models (PLMs), to capture both structural and textual information in TAGs. This diversity raises a key question: How can we design universal adversarial attacks that generalize across architectures to assess the security of TAG models? The challenge arises from the stark contrast in how different backbones-GNNs and PLMs-perceive and encode graph patterns, coupled with the fact that many PLMs are only accessible via APIs, limiting attacks to black-box settings. To address this, we propose BadGraph, a novel attack framework that deeply elicits large language models (LLMs) understanding of general graph knowledge to jointly perturb both node topology and textual semantics. Specifically, we design a target influencer retrieval module that leverages graph priors to construct cross-modally aligned attack shortcuts, thereby enabling efficient LLM-based perturbation reasoning. Experiments show that BadGraph achieves universal and effective attacks across GNN- and LLM-based reasoners, with up to a 76.3% performance drop, while theoretical and empirical analyses confirm its stealthy yet interpretable nature.
Problem

Research questions and friction points this paper is trying to address.

text-attributed graphs
universal adversarial attacks
graph learning
large language models
black-box attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

universal adversarial attack
text-attributed graphs
large language models
graph neural networks
black-box attack
🔎 Similar Papers
No similar papers found.
Z
Zihui Chen
School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
Y
Yuling Wang
School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
Pengfei Jiao
Pengfei Jiao
Hangzhou Dianzi University
Probabilistic Graphical ModelsGraph Neural NetworkNetwork EmbeddingTemporal NetworksRecommender System
K
Kai Wu
Hangzhou Dianzi University, Hangzhou, China
Xiao Wang
Xiao Wang
Professor, Beihang University
network embeddinggraph neural networksdata miningmachine learning
X
Xiang Ao
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
D
Dalin Zhang
Hangzhou Dianzi University, Hangzhou, China