🤖 AI Summary
To address the growing complexity and multilingualism of phishing websites—which hinder manual identification—this paper proposes an LLM-driven zero-shot detection framework. The method employs an autonomous agent system that requires no training data or annotations, integrating webpage content, DNS records, and multi-source user reviews for cross-perspective reasoning to detect phishing sites across languages and categories. Its key innovation lies in the first end-to-end application of a large language model (GPT-4) for fully autonomous analysis, eliminating reliance on traditional feature engineering. Experimental results demonstrate strong generalization and practical efficacy: the framework achieves 97.2% accuracy on English phishing detection across four website categories, and 99.3% accuracy in multilingual e-commerce scenarios involving English, Chinese, and Japanese. These results validate its robust cross-lingual and cross-domain performance without task-specific adaptation.
📝 Abstract
With the rise of sophisticated scam websites that exploit human psychological vulnerabilities, distinguishing between legitimate and scam websites has become increasingly challenging. This paper presents ScamFerret, an innovative agent system employing a large language model (LLM) to autonomously collect and analyze data from a given URL to determine whether it is a scam. Unlike traditional machine learning models that require large datasets and feature engineering, ScamFerret leverages LLMs' natural language understanding to accurately identify scam websites of various types and languages without requiring additional training or fine-tuning. Our evaluation demonstrated that ScamFerret achieves 0.972 accuracy in classifying four scam types in English and 0.993 accuracy in classifying online shopping websites across three different languages, particularly when using GPT-4. Furthermore, we confirmed that ScamFerret collects and analyzes external information such as web content, DNS records, and user reviews as necessary, providing a basis for identifying scam websites from multiple perspectives. These results suggest that LLMs have significant potential in enhancing cybersecurity measures against sophisticated scam websites.