A Lightweight Defense Mechanism against Next Generation of Phishing Emails using Distilled Attention-Augmented BiLSTM

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This work addresses the challenge posed by highly realistic phishing emails generated by large language models (LLMs), which often evade traditional text-based detection systems. To this end, the authors propose a lightweight phishing detection method based on knowledge distillation, wherein a MobileBERT model is distilled into an attention-augmented BiLSTM architecture with only 4.5 million parameters. The resulting model enables real-time, privacy-preserving detection at both endpoint and gateway levels without requiring hardware acceleration. By incorporating a multi-head attention mechanism and training on a hybrid dataset that includes LLM-generated phishing samples, the model achieves competitive performance: under five evaluation protocols, it incurs only a 1–2.5 point drop in weighted F1 score compared to state-of-the-art Transformer baselines, while offering 80–95% faster inference and a 95–99% reduction in model size.

Technology Category

Application Category

📝 Abstract

The current generation of large language models produces sophisticated social-engineering content that bypasses standard text screening systems in business communication platforms. Our proposed solution for mail gateway and endpoint deception detection operates in a privacy-protective manner while handling the performance requirements of network and mobile security systems. The MobileBERT teacher receives fine-tuning before its transformation into a BiLSTM model with multi-head attention which maintains semantic discrimination only with 4.5 million parameters. The hybrid dataset contains human-written messages together with LLM-generated paraphrases that use masking techniques and personalization methods to enhance modern attack resistance. The evaluation system uses five testing protocols which include human-only and LLM-only tests and two cross-distribution transfer tests and a production-like mixed traffic test to assess performance in native environments and across different distribution types and combined traffic scenarios. The distilled model maintains a weighted-F1 score difference of 1-2.5 points compared to the mixture split results of strong transformer baselines including ModernBERT, DeBERTaV3-base, T5-base, DeepSeek-R1 Distill Qwen-1.5B and Phi-4 mini while achieving 80-95\% faster inference times and 95-99\% smaller model sizes. The system demonstrates excellent performance in terms of accuracy and latency while maintaining a compact size which enables real-time filtering without acceleration hardware and supports policy-based management. The paper examines system performance under high traffic conditions and security measures for privacy protection and implementation methods for operational deployment.

Problem

Research questions and friction points this paper is trying to address.

phishing emails

large language models

social engineering

email security

text screening

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Distillation

Attention-Augmented BiLSTM

Phishing Email Detection