RealHarm: A Collection of Real-World Language Model Application Failures

πŸ“… 2025-04-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

206K/year
πŸ€– AI Summary
This paper addresses the lack of empirical failure analysis for language models deployed in consumer-facing applications. To bridge this gap, we introduce RealHarmβ€”the first publicly available, real-world AI application failure dataset grounded in verified incidents. Our methodology involves systematic event mining from public sources and multi-dimensional human annotation, adopting a deployer-centric perspective to classify harm types (e.g., reputational damage, misinformation), root causes, and risk propagation pathways; we further empirically evaluate the efficacy of mainstream content safety guardrails against these authentic failures. Key contributions include: (1) establishing the first empirically grounded, organization-level AI failure repository; (2) revealing a substantial misalignment between regulatory frameworks and observed operational risks; (3) identifying reputational damage as the most prevalent organizational harm and misinformation as the dominant risk category; and (4) demonstrating critically low interception rates of existing safety systems against real-world failure instances.

Technology Category

Application Category

πŸ“ Abstract
Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.
Problem

Research questions and friction points this paper is trying to address.

Identify real-world failures in language model applications
Analyze harms and hazards from deployer's perspective
Evaluate effectiveness of current guardrails and moderation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces RealHarm dataset from real incidents
Analyzes harms from deployer's perspective
Evaluates guardrails and moderation systems