X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical challenge of misclassification in spam and phishing detection, which undermines system security and trustworthiness. The authors propose an interpretable method that integrates SHAP-based feature attribution with non-negative matrix factorization (NMF) to enable topic-level semantic explanations for misclassified instances. By constructing topical profiles of messages and quantifying their divergence from the topic distributions of legitimate and malicious classes using Jensen–Shannon divergence, the approach offers both diagnostic and corrective capabilities. Evaluated on SMS and phishing datasets, misclassified samples exhibit over twice the distributional divergence compared to correctly classified ones. Deployed as a detector, the method achieves 0.98 AUROC with a false rejection rate of only 0.089 at 95% true recall; as a repair layer, it successfully recovers 97% of erroneously rejected legitimate messages.

Technology Category

Application Category

📝 Abstract
Misclassifications in spam and phishing detection are very harmful, as false negatives expose users to attacks while false positives degrade trust. Existing uncertainty-based detectors can flag potential errors, but possibly be deceived and offer limited interpretability. This paper presents X-MAP, an eXplainable Misclassification Analysis and Profilling framework that reveals topic-level semantic patterns behind model failures. X-MAP combines SHAP-based feature attributions with non-negative matrix factorization to build interpretable topic profiles for reliably classified spam/phishing and legitimate messages, and measures each message's deviation from these profiles using Jensen-Shannon divergence. Experiments on SMS and phishing datasets show that misclassified messages exhibit at least two times larger divergence than correctly classified ones. As a detector, X-MAP achieves up to 0.98 AUROC and lowers the false-rejection rate at 95% TRR to 0.089 on positive predictions. When used as a repair layer on base detectors, it recovers up to 97% of falsely rejected correct predictions with moderate leakage. These results demonstrate X-MAP's effectiveness and interpretability for improving spam and phishing detection.
Problem

Research questions and friction points this paper is trying to address.

misclassification
spam detection
phishing detection
false negatives
false positives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable AI
Misclassification Analysis
Non-negative Matrix Factorization
SHAP
Jensen-Shannon Divergence
🔎 Similar Papers
No similar papers found.