Efficient Phishing URL Detection Using Graph-based Machine Learning and Loopy Belief Propagation

📅 2025-01-12

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

To address the vulnerability and poor robustness of traditional string-based URL features in phishing website detection—easily evaded by obfuscation techniques—this paper proposes a heterogeneous graph modeling approach that jointly encodes URL structural patterns and network-layer entities (e.g., IP addresses, authoritative DNS servers). We introduce a novel dynamic edge potential mechanism and an enhanced, convergence-guaranteed loopy belief propagation (LBP) algorithm, enabling stable and interpretable probabilistic inference over complex heterogeneous graphs. Furthermore, we integrate graph neural networks with network topology-aware feature extraction to support end-to-end phishing URL classification. Evaluated on real-world datasets, our method achieves an F1-score of 98.77%, significantly outperforming state-of-the-art approaches. The framework demonstrates high reproducibility and practical deployability in operational security systems.

Technology Category

Application Category

📝 Abstract

The proliferation of mobile devices and online interactions have been threatened by different cyberattacks, where phishing attacks and malicious Uniform Resource Locators (URLs) pose significant risks to user security. Traditional phishing URL detection methods primarily rely on URL string-based features, which attackers often manipulate to evade detection. To address these limitations, we propose a novel graph-based machine learning model for phishing URL detection, integrating both URL structure and network-level features such as IP addresses and authoritative name servers. Our approach leverages Loopy Belief Propagation (LBP) with an enhanced convergence strategy to enable effective message passing and stable classification in the presence of complex graph structures. Additionally, we introduce a refined edge potential mechanism that dynamically adapts based on entity similarity and label relationships to further improve classification accuracy. Comprehensive experiments on real-world datasets demonstrate our model's effectiveness by achieving F1 score of up to 98.77%. This robust and reproducible method advances phishing detection capabilities, offering enhanced reliability and valuable insights in the field of cybersecurity.

Problem

Research questions and friction points this paper is trying to address.

Cybersecurity

Phishing Attacks

Fake Websites

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based Machine Learning

Cyclic Belief Propagation

Variable Rule Mechanism

🔎 Similar Papers

PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection

2024-08-20arXiv.orgCitations: 0

ByteDance

西雅图

Machine Learning Engineer