🤖 AI Summary
This study systematically evaluates large language models (LLMs) and large multimodal models (LMMs) on multimodal deception detection across diverse real-world scenarios—including courtroom testimony, interpersonal deception, and fake reviews. We propose a unified benchmark framework integrating RLTD, MU3D, and OpSpam datasets, and incorporate zero-/few-shot learning, similarity-driven in-context example selection, chain-of-thought prompting, nonverbal feature fusion, and video summarization enhancement. To our knowledge, this is the first comprehensive comparison of open-source and commercial models on this task. Results show that fine-tuned LLMs achieve state-of-the-art performance on text-only lie detection; while LMMs exhibit theoretical cross-modal potential, they significantly underperform unimodal baselines—revealing critical bottlenecks in visual and temporal cue exploitation. Auxiliary features and structured reasoning improve accuracy and interpretability, yet cross-modal generalization remains limited. Our work establishes a rigorous benchmark, introduces effective methodological advances, and delivers key insights for trustworthy AI-based deception detection.
📝 Abstract
Detecting deception in an increasingly digital world is both a critical and challenging task. In this study, we present a comprehensive evaluation of the automated deception detection capabilities of Large Language Models (LLMs) and Large Multimodal Models (LMMs) across diverse domains. We assess the performance of both open-source and commercial LLMs on three distinct datasets: real life trial interviews (RLTD), instructed deception in interpersonal scenarios (MU3D), and deceptive reviews (OpSpam). We systematically analyze the effectiveness of different experimental setups for deception detection, including zero-shot and few-shot approaches with random or similarity-based in-context example selection. Our results show that fine-tuned LLMs achieve state-of-the-art performance on textual deception detection tasks, while LMMs struggle to fully leverage cross-modal cues. Additionally, we analyze the impact of auxiliary features, such as non-verbal gestures and video summaries, and examine the effectiveness of different prompting strategies, including direct label generation and chain-of-thought reasoning. Our findings provide key insights into how LLMs process and interpret deceptive cues across modalities, highlighting their potential and limitations in real-world deception detection applications.