FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the vulnerability of large language models (LLMs) to deceptive inputs in critical automated applications, which can lead to harmful outputs. Existing defense mechanisms suffer from limitations in effectiveness, interpretability, and generalization. To overcome these challenges, this study introduces a novel defense framework that integrates a structured knowledge graph of deception strategies and associated keywords into the LLM inference pipeline. By establishing high-confidence links between known deceptive tactics and suspicious input patterns, the approach enhances input representation and guides model prompting accordingly. Evaluated across four prominent LLMs and five representative deception scenarios, the proposed method significantly outperforms state-of-the-art defenses while offering interpretable decision rationales, thereby achieving both robust protection and strong generalization capabilities.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have been widely integrated into critical automated workflows, including contract review and job application processes. However, LLMs are susceptible to manipulation by fraudulent information, which can lead to harmful outcomes. Although advanced defense methods have been developed to address this issue, they often exhibit limitations in effectiveness, interpretability, and generalizability, particularly when applied to LLM-based applications. To address these challenges, we introduce FraudShield, a novel framework designed to protect LLMs from fraudulent content by leveraging a comprehensive analysis of fraud tactics. Specifically, FraudShield constructs and refines a fraud tactic-keyword knowledge graph to capture high-confidence associations between suspicious text and fraud techniques. The structured knowledge graph augments the original input by highlighting keywords and providing supporting evidence, guiding the LLM toward more secure responses. Extensive experiments show that FraudShield consistently outperforms state-of-the-art defenses across four mainstream LLMs and five representative fraud types, while also offering interpretable clues for the model's generations.

Problem

Research questions and friction points this paper is trying to address.

fraud attacks

large language models

defense

knowledge graph

interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge graph

fraud detection

large language models