Rethinking Fraud Safety Evaluation: Multi-Round Attacks Reveal Safety-Utility Tradeoffs in Graph-Context LLM Defenders

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing single-round safety evaluations fail to capture the trade-off between security and utility under multi-round, escalating adversarial behaviors in fraud defense. This work proposes a multi-round replay and adaptive attack evaluation framework, introducing for the first time a dynamic assessment paradigm that enables fine-grained analysis of how graph-contextualized large language models balance rejection timing, false rejections of benign inputs, and exploitation of risk signals. The study reveals that performance costs stem from the LLM’s utilization of graph context rather than the graph encoder itself. Experiments with Qwen-1.5B integrated with static and temporal graph neural networks show that graph-contextualized defenders can safely reject fraudulent attempts earlier across multiple attack rounds, albeit at the cost of increased benign false rejections; while temporal graph semantics demonstrate greater robustness, they do not yet significantly outperform static graphs on primary metrics.

📝 Abstract

Single-turn safety evaluation is a poor proxy for real fraud defense, where attackers escalate across multiple rounds. This paper evaluates fraud defenders under replay and adaptive multi-round attacks and measures when a defender refuses, not just whether it eventually refuses. On a frozen multi-round suite built from Fraud-R1, graph-context defenders improve early safe refusal relative to text-only baselines under both replay and adaptive fraud pressure, but they also produce substantially more benign over-refusal. Direct probing of the trained graph encoder, together with paired shuffle-risk ablations on both fraud and benign sides replicated across two seeds on the Qwen-1.5B backbone, localises this cost to how the defender LLM consumes structured context rather than to graph-encoder quality: the encoder cleanly separates fraud from benign, while the LLM responds primarily to the presence of structured graph fields and only secondarily, and asymmetrically, to risk-score magnitude. Temporal graph context is directionally stronger than static and significantly better grounded, but is not yet conclusively superior on the main refusal metrics. The contribution is evaluative and measurement-oriented: robust fraud assessment must be multi-round, must report refusal timing, must account for benign false positives alongside fraud-side safety gains, and must localize observed costs to the graph signal or to how the LLM consumes it.

Problem

Research questions and friction points this paper is trying to address.

fraud detection

multi-round attacks

safety-utility tradeoff

graph-context LLM

over-refusal

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-round attacks

graph-context LLM

safety-utility tradeoff