Liars' Bench: Evaluating Lie Detectors for Language Models

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing large language model (LLM) lie detection methods are predominantly evaluated under constrained scenarios, failing to capture the diversity of deceptive behaviors. Method: We introduce Liars’ Bench—the first large-scale, multi-dimensional benchmark for lie detection—comprising 72,863 samples derived from honest and deceptive responses generated by four open-source LLMs across seven datasets. We propose a novel lie taxonomy grounded in reasoning motivation and belief objects, enabling identification of covert lies indistinguishable at the output level. Contribution/Results: Through systematic empirical analysis using both black-box and white-box approaches, we reveal pervasive, systematic failures of state-of-the-art detection techniques across canonical deception scenarios. Liars’ Bench provides a reproducible, extensible evaluation infrastructure to expose fundamental detection bottlenecks and advance research on trustworthy AI.

Technology Category

Application Category

📝 Abstract

Prior work has introduced techniques for detecting when large language models (LLMs) lie, that is, generating statements they believe are false. However, these techniques are typically validated in narrow settings that do not capture the diverse lies LLMs can generate. We introduce LIARS' BENCH, a testbed consisting of 72,863 examples of lies and honest responses generated by four open-weight models across seven datasets. Our settings capture qualitatively different types of lies and vary along two dimensions: the model's reason for lying and the object of belief targeted by the lie. Evaluating three black- and white-box lie detection techniques on LIARS' BENCH, we find that existing techniques systematically fail to identify certain types of lies, especially in settings where it's not possible to determine whether the model lied from the transcript alone. Overall, LIARS' BENCH reveals limitations in prior techniques and provides a practical testbed for guiding progress in lie detection.

Problem

Research questions and friction points this paper is trying to address.

Evaluating lie detection techniques for language models

Testing diverse types of lies generated by LLMs

Identifying systematic failures in existing lie detection methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces LIARS' BENCH testbed with 72,863 examples

Evaluates black- and white-box lie detection techniques

Reveals systematic failures in detecting certain lie types

🔎 Similar Papers

No similar papers found.