Evaluating Human and Machine Confidence in Phishing Email Detection: A Comparative Study

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates effective approaches to enhancing the accuracy and trustworthiness of phishing email detection through human–machine collaboration. By systematically comparing human evaluators with interpretable machine learning models—namely logistic regression, decision trees, and random forests—across TF-IDF and semantic embedding features, the work reveals key differences in their detection behaviors, including judgment rationales, confidence consistency, and utilization of linguistic cues. The findings indicate that while models achieve higher accuracy, their confidence levels exhibit greater variability; in contrast, humans demonstrate more stable confidence and richer use of linguistic signals. Furthermore, age significantly influences detection performance, whereas language proficiency shows minimal impact. These insights provide an empirical foundation and design guidance for developing trustworthy human–machine collaborative cybersecurity systems.

Technology Category

Application Category

📝 Abstract

Identifying deceptive content like phishing emails demands sophisticated cognitive processes that combine pattern recognition, confidence assessment, and contextual analysis. This research examines how human cognition and machine learning models work together to distinguish phishing emails from legitimate ones. We employed three interpretable algorithms Logistic Regression, Decision Trees, and Random Forests training them on both TF-IDF features and semantic embeddings, then compared their predictions against human evaluations that captured confidence ratings and linguistic observations. Our results show that machine learning models provide good accuracy rates, but their confidence levels vary significantly. Human evaluators, on the other hand, use a greater variety of language signs and retain more consistent confidence. We also found that while language proficiency has minimal effect on detection performance, aging does. These findings offer helpful direction for creating transparent AI systems that complement human cognitive functions, ultimately improving human-AI cooperation in challenging content analysis tasks.

Problem

Research questions and friction points this paper is trying to address.

phishing email detection

human confidence

machine confidence

human-AI collaboration

interpretable machine learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

human-AI collaboration

confidence calibration

interpretable machine learning