🤖 AI Summary
This paper addresses the challenge of assessing witness testimony reliability in judicial forensics by formally introducing and defining the novel task of *context-aware fine-grained inconsistency identification*, which aims to precisely localize contradictory segments across testimony pairs and support multi-hop reasoning across the six event dimensions (Who, What, When, Where, Why, How). To this end, we construct MIND—the first benchmark dataset featuring both explicit and implicit contradiction annotations—and propose INTEND, a method integrating 6W event element modeling, instruction tuning, multi-hop reasoning, and collaborative adaptation of masked language modeling (MLM) and large language models (LLMs). Experiments demonstrate that INTEND achieves a 5.63% F1-score improvement over conventional fine-tuning and standard prompting baselines, significantly enhancing both robustness and interpretability in inconsistency detection.
📝 Abstract
Incongruence detection in eyewitness narratives is critical for understanding the reliability of testimonies, yet traditional approaches often fail to address the nuanced inconsistencies inherent in such accounts. In this paper, we introduce a novel task of incongruence detection in eyewitness testimonies. Given a pair of testimonies containing of multiple pairs of question and answer by two subjects, we identify contextually related incongruence between the two subjects. We also mark the span of incongruences in the utterances. To achieve this, we developed MIND(MultI-EyewitNess Deception) - a comprehensive dataset consisting of 2927 pairs of contextually related answers designed to capture both explicit and implicit contradictions. INstruction - TunEd iNcongruity Detection framework based on 6W and multi-hop reasoning approach, aka. INTEND. Drawing from investigative techniques, INTEND address the task as a close-style problem, contradicting on the who, what, when, where and why aspect of the content. Our findings shows that prompt tuning, especially when utilizing our framework, enhances the detection of incongruences by a margin of +5.63 percent. We compare our approach with multiple fine-tuning and prompt tuning techniques on MLMs and LLMs. Emperical results demonstrate convincing performance improvement in F1-score over fine-tuned and regular prompt-tuning techniques, highlighting the effectiveness of our approach.