When lies are mostly truthful: automated verbal deception detection for embedded lies

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenging problem of automatically detecting “embedded lies”—statements interweaving true and false information—in realistic settings. To this end, we construct the first autobiographical corpus comprising 2,088 dual-version (truthful/deceptive) annotated instances and systematically define and annotate three core attributes of embedded lies: centrality, deceptiveness, and information source. Our analysis reveals that approximately two-thirds of typical deceptive statements consist of truthful content, challenging conventional binary truth-falsity classification paradigms. Methodologically, we fine-tune Llama-3-8B for text classification, integrating individual difference modeling, linguistic statistical analysis, and interpretability techniques—including attention visualization and token-level attribution. The resulting model achieves 64% accuracy on embedded lie detection. Empirical evaluation further shows its linguistic representations are highly similar to those of truthful statements, corroborating the intrinsic difficulty of this detection task.

Technology Category

Application Category

📝 Abstract
Background: Verbal deception detection research relies on narratives and commonly assumes statements as truthful or deceptive. A more realistic perspective acknowledges that the veracity of statements exists on a continuum with truthful and deceptive parts being embedded within the same statement. However, research on embedded lies has been lagging behind. Methods: We collected a novel dataset of 2,088 truthful and deceptive statements with annotated embedded lies. Using a within-subjects design, participants provided a truthful account of an autobiographical event. They then rewrote their statement in a deceptive manner by including embedded lies, which they highlighted afterwards and judged on lie centrality, deceptiveness, and source. Results: We show that a fined-tuned language model (Llama-3-8B) can classify truthful statements and those containing embedded lies with 64% accuracy. Individual differences, linguistic properties and explainability analysis suggest that the challenge of moving the dial towards embedded lies stems from their resemblance to truthful statements. Typical deceptive statements consisted of 2/3 truthful information and 1/3 embedded lies, largely derived from past personal experiences and with minimal linguistic differences with their truthful counterparts. Conclusion: We present this dataset as a novel resource to address this challenge and foster research on embedded lies in verbal deception detection.
Problem

Research questions and friction points this paper is trying to address.

Lie Detection
Truth and Deception
Automatic Speech Analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Lie Detection
Llama-3-8B Model
Annotated Dataset
🔎 Similar Papers
No similar papers found.