🤖 AI Summary
Phishing email detection faces challenges from dynamically evolving adversarial strategies and heterogeneous attack patterns, while conventional rule-based approaches and existing machine learning models suffer from limited generalizability and adaptability. To address this, we propose the first large language model (LLM)-based multi-agent collaborative detection framework, integrating multimodal analysis of text, URLs, and metadata. The framework incorporates an adversarial agent that generates context-aware phishing variants and an explanation-simplification agent that delivers interpretable outputs. Leveraging adversarial-aware Proximal Policy Optimization (PPO) reinforcement learning, it enables dynamic weight adjustment and system self-evolution, establishing a closed-loop “detection–adversarial generation–feedback optimization” pipeline. Evaluated on public benchmarks, our method achieves 97.89% accuracy, 2.73% false positive rate, and 0.20% false negative rate—significantly outperforming chain-of-thought prompting, single-agent baselines, and state-of-the-art methods. This work pioneers a self-evolving phishing defense paradigm combining LLM-powered multi-agent systems with adversarial reinforcement learning.
📝 Abstract
Phishing email detection faces critical challenges from evolving adversarial tactics and heterogeneous attack patterns. Traditional detection methods, such as rule-based filters and denylists, often struggle to keep pace with these evolving tactics, leading to false negatives and compromised security. While machine learning approaches have improved detection accuracy, they still face challenges adapting to novel phishing strategies. We present MultiPhishGuard, a dynamic LLM-based multi-agent detection system that synergizes specialized expertise with adversarial-aware reinforcement learning. Our framework employs five cooperative agents (text, URL, metadata, explanation simplifier, and adversarial agents) with automatically adjusted decision weights powered by a Proximal Policy Optimization reinforcement learning algorithm. To address emerging threats, we introduce an adversarial training loop featuring an adversarial agent that generates subtle context-aware email variants, creating a self-improving defense ecosystem and enhancing system robustness. Experimental evaluations on public datasets demonstrate that MultiPhishGuard significantly outperforms Chain-of-Thoughts, single-agent baselines and state-of-the-art detectors, as validated by ablation studies and comparative analyses. Experiments demonstrate that MultiPhishGuard achieves high accuracy (97.89%) with low false positive (2.73%) and false negative rates (0.20%). Additionally, we incorporate an explanation simplifier agent, which provides users with clear and easily understandable explanations for why an email is classified as phishing or legitimate. This work advances phishing defense through dynamic multi-agent collaboration and generative adversarial resilience.