SEPSIS: I Can Catch Your Lies - A New Paradigm for Deception Detection

📅 2023-12-01

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the automatic detection of “lies of omission”—a covert deceptive practice involving the deliberate concealment of critical information. We propose the first four-layer fine-grained annotation schema (type/color/intent/topic), grounded in psychological and communication theories. To enhance generalizability, we design a data-agnostic, model-fusion-based multi-task learning framework, enabling cross-task collaborative fine-tuning of BERT and RoBERTa. Experiments reveal statistically significant associations between lies of omission and rhetorical devices such as incitement. Evaluated on a large-scale dataset of 877,000 instances, our approach achieves an F1-score of 0.87. We publicly release both the annotated dataset and models under the MIT License, facilitating reproducible and extensible research in deception detection.

📝 Abstract

Deception is the intentional practice of twisting information. It is a nuanced societal practice deeply intertwined with human societal evolution, characterized by a multitude of facets. This research explores the problem of deception through the lens of psychology, employing a framework that categorizes deception into three forms: lies of omission, lies of commission, and lies of influence. The primary focus of this study is specifically on investigating only lies of omission. We propose a novel framework for deception detection leveraging NLP techniques. We curated an annotated dataset of 876,784 samples by amalgamating a popular large-scale fake news dataset and scraped news headlines from the Twitter handle of Times of India, a well-known Indian news media house. Each sample has been labeled with four layers, namely: (i) the type of omission (speculation, bias, distortion, sounds factual, and opinion), (ii) colors of lies(black, white, etc), and (iii) the intention of such lies (to influence, etc) (iv) topic of lies (political, educational, religious, etc). We present a novel multi-task learning pipeline that leverages the dataless merging of fine-tuned language models to address the deception detection task mentioned earlier. Our proposed model achieved an F1 score of 0.87, demonstrating strong performance across all layers including the type, color, intent, and topic aspects of deceptive content. Finally, our research explores the relationship between lies of omission and propaganda techniques. To accomplish this, we conducted an in-depth analysis, uncovering compelling findings. For instance, our analysis revealed a significant correlation between loaded language and opinion, shedding light on their interconnectedness. To encourage further research in this field, we will be making the models and dataset available with the MIT License, making it favorable for open-source research.

Problem

Research questions and friction points this paper is trying to address.

Detecting deception types using NLP techniques

Analyzing lies of omission in news content

Exploring links between omission lies and propaganda

Innovation

Methods, ideas, or system contributions that make the work stand out.

NLP-based deception detection framework

Multi-task learning with fine-tuned models

Annotated dataset with four labeling layers

🔎 Similar Papers

MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics