Hidden in Plain Sight: Evaluation of the Deception Detection Capabilities of LLMs in Multimodal Settings

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study systematically evaluates large language models (LLMs) and large multimodal models (LMMs) on multimodal deception detection across diverse real-world scenarios—including courtroom testimony, interpersonal deception, and fake reviews. We propose a unified benchmark framework integrating RLTD, MU3D, and OpSpam datasets, and incorporate zero-/few-shot learning, similarity-driven in-context example selection, chain-of-thought prompting, nonverbal feature fusion, and video summarization enhancement. To our knowledge, this is the first comprehensive comparison of open-source and commercial models on this task. Results show that fine-tuned LLMs achieve state-of-the-art performance on text-only lie detection; while LMMs exhibit theoretical cross-modal potential, they significantly underperform unimodal baselines—revealing critical bottlenecks in visual and temporal cue exploitation. Auxiliary features and structured reasoning improve accuracy and interpretability, yet cross-modal generalization remains limited. Our work establishes a rigorous benchmark, introduces effective methodological advances, and delivers key insights for trustworthy AI-based deception detection.

Technology Category

Application Category

📝 Abstract

Detecting deception in an increasingly digital world is both a critical and challenging task. In this study, we present a comprehensive evaluation of the automated deception detection capabilities of Large Language Models (LLMs) and Large Multimodal Models (LMMs) across diverse domains. We assess the performance of both open-source and commercial LLMs on three distinct datasets: real life trial interviews (RLTD), instructed deception in interpersonal scenarios (MU3D), and deceptive reviews (OpSpam). We systematically analyze the effectiveness of different experimental setups for deception detection, including zero-shot and few-shot approaches with random or similarity-based in-context example selection. Our results show that fine-tuned LLMs achieve state-of-the-art performance on textual deception detection tasks, while LMMs struggle to fully leverage cross-modal cues. Additionally, we analyze the impact of auxiliary features, such as non-verbal gestures and video summaries, and examine the effectiveness of different prompting strategies, including direct label generation and chain-of-thought reasoning. Our findings provide key insights into how LLMs process and interpret deceptive cues across modalities, highlighting their potential and limitations in real-world deception detection applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to detect deception in multimodal settings

Assessing performance of LLMs and LMMs on diverse deception datasets

Analyzing impact of auxiliary features and prompting strategies on detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates LLMs and LMMs for deception detection

Uses zero-shot and few-shot learning approaches

Analyzes multimodal cues and prompting strategies

🔎 Similar Papers

No similar papers found.