ScamFerret: Detecting Scam Websites Autonomously with Large Language Models

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address the growing complexity and multilingualism of phishing websites—which hinder manual identification—this paper proposes an LLM-driven zero-shot detection framework. The method employs an autonomous agent system that requires no training data or annotations, integrating webpage content, DNS records, and multi-source user reviews for cross-perspective reasoning to detect phishing sites across languages and categories. Its key innovation lies in the first end-to-end application of a large language model (GPT-4) for fully autonomous analysis, eliminating reliance on traditional feature engineering. Experimental results demonstrate strong generalization and practical efficacy: the framework achieves 97.2% accuracy on English phishing detection across four website categories, and 99.3% accuracy in multilingual e-commerce scenarios involving English, Chinese, and Japanese. These results validate its robust cross-lingual and cross-domain performance without task-specific adaptation.

Technology Category

Application Category

📝 Abstract

With the rise of sophisticated scam websites that exploit human psychological vulnerabilities, distinguishing between legitimate and scam websites has become increasingly challenging. This paper presents ScamFerret, an innovative agent system employing a large language model (LLM) to autonomously collect and analyze data from a given URL to determine whether it is a scam. Unlike traditional machine learning models that require large datasets and feature engineering, ScamFerret leverages LLMs' natural language understanding to accurately identify scam websites of various types and languages without requiring additional training or fine-tuning. Our evaluation demonstrated that ScamFerret achieves 0.972 accuracy in classifying four scam types in English and 0.993 accuracy in classifying online shopping websites across three different languages, particularly when using GPT-4. Furthermore, we confirmed that ScamFerret collects and analyzes external information such as web content, DNS records, and user reviews as necessary, providing a basis for identifying scam websites from multiple perspectives. These results suggest that LLMs have significant potential in enhancing cybersecurity measures against sophisticated scam websites.

Problem

Research questions and friction points this paper is trying to address.

Autonomously detect scam websites

Leverage large language models

Enhance cybersecurity measures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes large language models

Autonomously analyzes URL data

Detects scams without retraining

🔎 Similar Papers

Detecting Phishing Sites Using ChatGPT