TAGFN: A Text-Attributed Graph Dataset for Fake News Detection in the Age of LLMs

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

The lack of large-scale, realistic, and comprehensively annotated text-attributed graph benchmarks hinders progress in fake news detection for large language models (LLMs). Method: We introduce TG-FakeNews—the first large-scale text-attributed graph dataset tailored for graph-based anomaly detection—integrating real-world social propagation structures with fine-grained news content annotations, including veracity labels, semantic attributes, and diffusion paths. We propose a unified text-attributed graph modeling framework that enables end-to-end alignment of structural topology and semantic features. Contribution/Results: We publicly release the dataset, benchmark code, and pre-trained model interfaces. TG-FakeNews supports joint evaluation of both traditional graph neural networks and LLM-enhanced graph models, filling a critical gap in high-quality, graph-centric fake news detection benchmarks. This work significantly advances trustworthy AI and fosters LLM-driven graph anomaly detection research.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently revolutionized machine learning on text-attributed graphs, but the application of LLMs to graph outlier detection, particularly in the context of fake news detection, remains significantly underexplored. One of the key challenges is the scarcity of large-scale, realistic, and well-annotated datasets that can serve as reliable benchmarks for outlier detection. To bridge this gap, we introduce TAGFN, a large-scale, real-world text-attributed graph dataset for outlier detection, specifically fake news detection. TAGFN enables rigorous evaluation of both traditional and LLM-based graph outlier detection methods. Furthermore, it facilitates the development of misinformation detection capabilities in LLMs through fine-tuning. We anticipate that TAGFN will be a valuable resource for the community, fostering progress in robust graph-based outlier detection and trustworthy AI. The dataset is publicly available at https://huggingface.co/datasets/kayzliu/TAGFN and our code is available at https://github.com/kayzliu/tagfn.

Problem

Research questions and friction points this paper is trying to address.

Addresses the lack of large-scale datasets for fake news detection on text-attributed graphs.

Enables evaluation of traditional and LLM-based graph outlier detection methods.

Facilitates development of misinformation detection in LLMs through fine-tuning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces TAGFN dataset for fake news detection

Enables evaluation of traditional and LLM-based methods

Facilitates LLM fine-tuning for misinformation detection

🔎 Similar Papers

No similar papers found.