Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the challenge that end users face in comprehending cybersecurity alerts generated by large language models (LLMs). To this end, we propose the Human-Centered Security Alert Evaluation Framework (HCSAEF). Methodologically, HCSAEF integrates human factors engineering principles, natural language metrics, LLM output parsing, semantic consistency verification, and behavior-oriented empirical user studies. Its core contribution is the first quantifiable, multi-dimensional evaluation system—assessing alert understandability, urgency, and correctness—that enables cross-prompt, cross-model, and cross-consistency comparative analysis and root-cause attribution of alert quality. Evaluated across three representative use cases, HCSAEF effectively discriminates the impacts of prompt engineering, model selection, and output stability on alert quality. Results demonstrate its strong discriminative power and practical utility in assessing intuitiveness, urgency, and factual correctness of LLM-generated security notifications.

Technology Category

Application Category

📝 Abstract

Due to the increasing presence of networked devices in everyday life, not only cybersecurity specialists but also end users benefit from security applications such as firewalls, vulnerability scanners, and intrusion detection systems. Recent approaches use large language models (LLMs) to rewrite brief, technical security alerts into intuitive language and suggest actionable measures, helping everyday users understand and respond appropriately to security risks. However, it remains an open question how well such alerts are explained to users. LLM outputs can also be hallucinated, inconsistent, or misleading. In this work, we introduce the Human-Centered Security Alert Evaluation Framework (HCSAEF). HCSAEF assesses LLM-generated cybersecurity notifications to support researchers who want to compare notifications generated for everyday users, improve them, or analyze the capabilities of different LLMs in explaining cybersecurity issues. We demonstrate HCSAEF through three use cases, which allow us to quantify the impact of prompt design, model selection, and output consistency. Our findings indicate that HCSAEF effectively differentiates generated notifications along dimensions such as intuitiveness, urgency, and correctness.

Problem

Research questions and friction points this paper is trying to address.

Evaluating effectiveness of LLM-generated cybersecurity alerts for non-experts

Assessing consistency and accuracy of LLM-based security notifications

Developing framework to improve user-friendly cybersecurity warning systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-rewritten intuitive cybersecurity notifications

Human-Centered Security Alert Evaluation Framework

Assesses intuitiveness urgency correctness of alerts

🔎 Similar Papers

No similar papers found.