π€ AI Summary
This paper addresses the lack of empirical failure analysis for language models deployed in consumer-facing applications. To bridge this gap, we introduce RealHarmβthe first publicly available, real-world AI application failure dataset grounded in verified incidents. Our methodology involves systematic event mining from public sources and multi-dimensional human annotation, adopting a deployer-centric perspective to classify harm types (e.g., reputational damage, misinformation), root causes, and risk propagation pathways; we further empirically evaluate the efficacy of mainstream content safety guardrails against these authentic failures. Key contributions include: (1) establishing the first empirically grounded, organization-level AI failure repository; (2) revealing a substantial misalignment between regulatory frameworks and observed operational risks; (3) identifying reputational damage as the most prevalent organizational harm and misinformation as the dominant risk category; and (4) demonstrating critically low interception rates of existing safety systems against real-world failure instances.
π Abstract
Language model deployments in consumer-facing applications introduce numerous risks. While existing research on harms and hazards of such applications follows top-down approaches derived from regulatory frameworks and theoretical analyses, empirical evidence of real-world failure modes remains underexplored. In this work, we introduce RealHarm, a dataset of annotated problematic interactions with AI agents built from a systematic review of publicly reported incidents. Analyzing harms, causes, and hazards specifically from the deployer's perspective, we find that reputational damage constitutes the predominant organizational harm, while misinformation emerges as the most common hazard category. We empirically evaluate state-of-the-art guardrails and content moderation systems to probe whether such systems would have prevented the incidents, revealing a significant gap in the protection of AI applications.