Phishing the Phishers with SpecularNet: Hierarchical Graph Autoencoding for Reference-Free Web Phishing Detection

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work proposes SpecularNet, a lightweight, end-to-end phishing webpage detection framework that operates without external knowledge bases or multimodal inputs. By leveraging only the domain name and HTML structure, SpecularNet models the DOM as a tree and employs a hierarchical graph autoencoder with directional hierarchical message passing to capture high-order structural invariances characteristic of phishing pages—marking the first approach to achieve this under a reference-free setting. Evaluated on standard benchmarks, the method achieves an F1 score of 93.9% with a per-page inference time of approximately 20 milliseconds. It further demonstrates robust performance on a newly collected open-world dataset of 2,026 samples and under adversarial attacks, significantly outperforming existing approaches in both accuracy and generalization.

Technology Category

Application Category

📝 Abstract

Phishing remains the most pervasive threat to the Web, enabling large-scale credential theft and financial fraud through deceptive webpages. While recent reference-based and generative-AI-driven phishing detectors achieve strong accuracy, their reliance on external knowledge bases, cloud services, and complex multimodal pipelines fundamentally limits practicality, scalability, and reproducibility. In contrast, conventional deep learning approaches often fail to generalize to evolving phishing campaigns. We introduce SpecularNet, a novel lightweight framework for reference-free web phishing detection that demonstrates how carefully designed compact architectures can rival heavyweight systems. SpecularNet operates solely on the domain name and HTML structure, modeling the Document Object Model (DOM) as a tree and leveraging a hierarchical graph autoencoding architecture with directional, level-wise message passing. This design captures higher-order structural invariants of phishing webpages while enabling fast, end-to-end inference on standard CPUs. Extensive evaluation against 13 state of the art phishing detectors, including leading reference-based systems, shows that SpecularNet achieves competitive detection performance with dramatically lower computational cost. On benchmark datasets, it reaches an F1 score of 93.9%, trailing the best reference-based method slightly while reducing inference time from several seconds to approximately 20 milliseconds per webpage. Field and robustness evaluations further validate SpecularNet in real-world deployments, on a newly collected 2026 open-world dataset, and against adversarial attacks.

Problem

Research questions and friction points this paper is trying to address.

phishing detection

reference-free

web security

DOM structure

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical graph autoencoding

reference-free phishing detection

DOM tree modeling