🤖 AI Summary
This work addresses the lack of rigorous evaluation of large language models’ (LLMs) defensive capabilities against dynamic, real-world fraud. We introduce Fraud-R1—the first multi-turn anti-fraud benchmark grounded in 8,564 authentic fraud cases—covering five fraud types (e.g., phishing, fake job postings) and enabling fine-grained robustness assessment across critical stages: trust establishment, urgency induction, and emotional manipulation. Methodologically, we propose a novel multi-turn adversarial evaluation paradigm distinguishing between Helpful-Assistant and Role-play settings, and establish a systematic evaluation framework spanning cross-model, cross-lingual, and cross-task dimensions. Experimental results reveal that role-playing significantly degrades defense performance; fake job detection achieves the lowest identification rate; and Chinese LMs exhibit an average 27.3% lower defense success rate than their English counterparts—highlighting an urgent need for multilingual fraud robustness research.
📝 Abstract
We introduce Fraud-R1, a benchmark designed to evaluate LLMs' ability to defend against internet fraud and phishing in dynamic, real-world scenarios. Fraud-R1 comprises 8,564 fraud cases sourced from phishing scams, fake job postings, social media, and news, categorized into 5 major fraud types. Unlike previous benchmarks, Fraud-R1 introduces a multi-round evaluation pipeline to assess LLMs' resistance to fraud at different stages, including credibility building, urgency creation, and emotional manipulation. Furthermore, we evaluate 15 LLMs under two settings: 1. Helpful-Assistant, where the LLM provides general decision-making assistance, and 2. Role-play, where the model assumes a specific persona, widely used in real-world agent-based interactions. Our evaluation reveals the significant challenges in defending against fraud and phishing inducement, especially in role-play settings and fake job postings. Additionally, we observe a substantial performance gap between Chinese and English, underscoring the need for improved multilingual fraud detection capabilities.