PhishParrot: LLM-Driven Adaptive Crawling to Unveil Cloaked Phishing Sites

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Phishing websites employ cloaking techniques to serve benign pages to security crawlers while delivering malicious content to genuine users, thereby evading conventional detection methods. To address this, we propose PhishParrot—a novel system that integrates large language models (LLMs) with case-based reasoning for context-aware semantic analysis. PhishParrot dynamically generates environment-adaptive user configurations—including user agents, geolocated IP addresses, and realistic behavioral sequences—to construct customized crawling environments capable of bypassing cloaking defenses. It orchestrates intelligent, synergistic adjustments of browser and network parameters by fusing heterogeneous data sources. Evaluated over 21 days in real-world settings, PhishParrot achieves a 33.8% improvement in detection accuracy and successfully synthesizes 91 distinct crawling configurations, significantly enhancing robustness against cloaked phishing sites.

Technology Category

Application Category

📝 Abstract

Phishing attacks continue to evolve, with cloaking techniques posing a significant challenge to detection efforts. Cloaking allows attackers to display phishing sites only to specific users while presenting legitimate pages to security crawlers, rendering traditional detection systems ineffective. This research proposes PhishParrot, a novel crawling environment optimization system designed to counter cloaking techniques. PhishParrot leverages the contextual analysis capabilities of Large Language Models (LLMs) to identify potential patterns in crawling information, enabling the construction of optimal user profiles capable of bypassing cloaking mechanisms. The system accumulates information on phishing sites collected from diverse environments. It then adapts browser settings and network configurations to match the attacker's target user conditions based on information extracted from similar cases. A 21-day evaluation showed that PhishParrot improved detection accuracy by up to 33.8% over standard analysis systems, yielding 91 distinct crawling environments for diverse conditions targeted by attackers. The findings confirm that the combination of similar-case extraction and LLM-based context analysis is an effective approach for detecting cloaked phishing attacks.

Problem

Research questions and friction points this paper is trying to address.

Detect cloaked phishing sites bypassing traditional crawlers

Optimize crawling environments to mimic attacker-targeted user profiles

Improve accuracy in identifying phishing sites using LLM analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven contextual analysis for cloaking detection

Adaptive browser and network configuration optimization

Similar-case extraction to enhance phishing detection

🔎 Similar Papers

Detecting Phishing Sites Using ChatGPT