Characterizing Phishing Pages by JavaScript Capabilities

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

To address the scalability challenge posed by widespread phishing kit deployment and the inefficiency of manual analysis, this paper proposes the first automated clustering method for attributing phishing pages to their respective phishing kit families, based on JavaScript behavioral logic complexity. Our approach integrates dynamic execution analysis, code similarity measurement, and unsupervised learning to construct a static-dynamic hybrid feature detection system. Evaluated on 4,562 URLs, the system achieves 97% attribution accuracy and clusters 434,000 phishing pages into 11,377 coherent groups. We further provide the first quantitative empirical findings: 90% of phishing pages employ UI-based social engineering techniques to induce user interaction, and 80% implement basic device or browser fingerprinting capabilities. By replacing labor-intensive per-page inspection with scalable, reproducible automation, our work establishes a robust empirical foundation for monitoring phishing attack evolution and informing adaptive defense strategies.

Technology Category

Application Category

📝 Abstract

In 2024, the Anti-Phishing Work Group identified over one million phishing pages. Phishers achieve this scale by using phishing kits -- ready-to-deploy phishing websites -- to rapidly deploy phishing campaigns with specific data exfiltration, evasion, or mimicry techniques. In contrast, researchers and defenders continue to fight phishing on a page-by-page basis and rely on manual analysis to recognize static features for kit identification. This paper aims to aid researchers and analysts by automatically differentiating groups of phishing pages based on the underlying kit, automating a previously manual process, and enabling us to measure how popular different client-side techniques are across these groups. For kit detection, our system has an accuracy of 97% on a ground-truth dataset of 548 kit families deployed across 4,562 phishing URLs. On an unlabeled dataset, we leverage the complexity of 434,050 phishing pages' JavaScript logic to group them into 11,377 clusters, annotating the clusters with what phishing techniques they employ. We find that UI interactivity and basic fingerprinting are universal techniques, present in 90% and 80% of the clusters, respectively. On the other hand, mouse detection via the browser's mouse API is among the rarest behaviors, despite being used in a deployment of a 7-year-old open-source phishing kit. Our methods and findings provide new ways for researchers and analysts to tackle the volume of phishing pages.

Problem

Research questions and friction points this paper is trying to address.

Automatically differentiate phishing pages by underlying kit

Automate manual process for kit identification

Measure popularity of client-side techniques across groups

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically groups phishing pages by kit

Uses JavaScript logic complexity for clustering

Achieves 97% accuracy in kit detection

🔎 Similar Papers

Evaluating the Effectiveness and Robustness of Visual Similarity-based Phishing Detection Models

2024-05-30arXiv.orgCitations: 0

💼 Related Jobs

Research Engineer, Privacy

OpenAI

$380K – $445K • Offers Equity

San Francisco

Authors to Follow