🤖 AI Summary
Existing approaches struggle to model the dynamic evolution and escalating risk of multi-turn real-world scams, such as romance or investment fraud. This work proposes PreScam, a benchmark constructed from 177,989 user reports, yielding 11,573 structured scam dialogues hierarchically organized according to a novel “scam kill chain” framework that captures the scam lifecycle. Each dialogue turn is annotated with fine-grained labels indicating the scammer’s psychological tactics and the victim’s responses, enabling tasks such as real-time termination prediction and next-step behavior forecasting. Experimental results show that supervised encoders significantly outperform zero-shot large language models (LLMs) in termination prediction, while even strong LLMs achieve only moderate performance in behavior prediction, revealing that current models still fall short in fully capturing scam dynamics and highlighting a critical research gap this work addresses.
📝 Abstract
Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in which scammers gradually manipulate victims using evolving psychological techniques. However, existing research mainly focuses on static scam detection or synthetic scams, leaving open whether language models can understand how real-world scams progress over time. We introduce PreScam, a benchmark for modeling scam progression from early conversations. Built from user-submitted scam reports, PreScam filters and structures 177,989 raw reports into 11,573 conversational scam instances spanning 20 scam categories. Each instance is hierarchically structured according to the scam lifecycle defined by the proposed scam kill chain, and further annotated at the turn level with scammer psychological actions and victim responses. We benchmark models on two tasks: real-time termination prediction, which estimates whether a conversation is approaching the termination stage, and scammer action prediction, which forecasts the scammer's subsequent actions. Results show a clear gap between surface-level fluency and progression modeling: supervised encoders substantially outperform zero-shot LLMs on real-time termination prediction, while next-action prediction remains only moderately successful even for strong LLMs. Taken together, these results show that current models can capture some scam-related cues, yet still struggle to track how risk escalates and how manipulation unfolds across turns.