LOKI: Proactively Discovering Online Scam Websites by Mining Toxic Search Queries

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing fraud detection systems rely heavily on user reports or limited proactive crawling, resulting in delayed response, insufficient coverage, and poor generalization to previously unseen fraud types. To address these limitations, this paper proposes a toxic keyword scoring model grounded in Learning Using Privileged Information (LUPI) and distillation of Search Engine Results Page (SERP) features. The model is seeded with a small set of known fraud examples and enables proactive generalization to novel fraud categories. It jointly leverages structured SERP feature extraction, keyword importance modeling, and an iterative expansion mechanism. Evaluated across ten prevalent fraud categories, the method achieves a 20.58× improvement in discovery efficiency over baseline approaches. Using only 1,663 known malicious sites as seeds, it identifies 52,493 previously unreported fraudulent websites—demonstrating substantial gains in early detection capability for emerging online scams.

Technology Category

Application Category

📝 Abstract

Online e-commerce scams, ranging from shopping scams to pet scams, globally cause millions of dollars in financial damage every year. In response, the security community has developed highly accurate detection systems able to determine if a website is fraudulent. However, finding candidate scam websites that can be passed as input to these downstream detection systems is challenging: relying on user reports is inherently reactive and slow, and proactive systems issuing search engine queries to return candidate websites suffer from low coverage and do not generalize to new scam types. In this paper, we present LOKI, a system designed to identify search engine queries likely to return a high fraction of fraudulent websites. LOKI implements a keyword scoring model grounded in Learning Under Privileged Information (LUPI) and feature distillation from Search Engine Result Pages (SERPs). We rigorously validate LOKI across 10 major scam categories and demonstrate a 20.58 times improvement in discovery over both heuristic and data- driven baselines across all categories. Leveraging a small seed set of only 1,663 known scam sites, we use the keywords identified by our method to discover 52,493 previously unreported scams in the wild. Finally, we show that LOKI generalizes to previously-unseen scam categories, highlighting its utility in surfacing emerging threats.

Problem

Research questions and friction points this paper is trying to address.

Proactively discovers scam websites via toxic search queries

Overcomes reactive user reports and low coverage methods

Generalizes to new scam types with high discovery improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mining toxic search queries for scam discovery

Using LUPI and SERP feature distillation

Generalizing to unseen scam categories proactively

🔎 Similar Papers

Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains