From Hallucination to Scheming: A Unified Taxonomy and Benchmark Analysis for LLM Deception

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study addresses the fragmented research landscape and inconsistent terminology surrounding misleading outputs of large language models—particularly hallucinations and strategic deception—by proposing the first unified three-dimensional framework that integrates both phenomena. The framework systematically characterizes these behaviors along the axes of goal-directedness, target of deception, and underlying mechanism. Through qualitative categorization and multidimensional mapping of 50 existing benchmarks, the analysis reveals a predominant focus on factual fabrication, with significant gaps in coverage of pragmatic distortion, attribution errors, and self-awareness of capabilities. Building on these insights, the work offers actionable evaluation and reporting templates for developers and regulators to advance standardization in model assessment practices.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) produce systematically misleading outputs, from hallucinated citations to strategic deception of evaluators, yet these phenomena are studied by separate communities with incompatible terminology. We propose a unified taxonomy organized along three complementary dimensions: degree of goal-directedness (behavioral to strategic deception), object of deception, and mechanism (fabrication, omission, or pragmatic distortion). Applying this taxonomy to 50 existing benchmarks reveals that every benchmark tests fabrication while pragmatic distortion, attribution, and capability self-knowledge remain critically under-covered, and strategic deception benchmarks are nascent. We offer concrete recommendations for developers and regulators, including a minimal reporting template for positioning future work within our framework.

Problem

Research questions and friction points this paper is trying to address.

LLM deception

hallucination

strategic deception

unified taxonomy

benchmark analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified taxonomy

LLM deception

strategic deception