Open Domain Question Answering with Conflicting Contexts

📅 2024-10-16

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 1

career value

180K/year

🤖 AI Summary

In open-domain question answering, retrieved contexts frequently contain factual contradictions, leading to distorted model outputs. This work first quantifies the issue, revealing that 25% of unambiguous questions trigger conflicting contexts, and introduces QACC—a manually annotated benchmark—to systematically expose the significant failure of mainstream large language models (LLMs) under such conditions. To address this, we propose *explanation-driven interpretable fine-tuning*, a novel paradigm wherein human-structured conflict reasoning—namely, identifying contradiction sources, weighing conflicting evidence, and deriving logically consistent conclusions—serves as explicit supervision for training LLMs (e.g., LLaMA-2) to perform conflict-aware reasoning. On QACC, our method achieves an average accuracy gain of 18.7%, substantially improving models’ ability to detect, evaluate, and resolve contradictory information while maintaining answer consistency. This work establishes both a new methodological framework and a foundational benchmark for robust open-domain QA.

Technology Category

Application Category

📝 Abstract

Open domain question answering systems frequently rely on information retrieved from large collections of text (such as the Web) to answer questions. However, such collections of text often contain conflicting information, and indiscriminately depending on this information may result in untruthful and inaccurate answers. To understand the gravity of this problem, we collect a human-annotated dataset, Question Answering with Conflicting Contexts (QACC), and find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search. We evaluate and benchmark three powerful Large Language Models (LLMs) with our dataset QACC and demonstrate their limitations in effectively addressing questions with conflicting information. To explore how humans reason through conflicting contexts, we request our annotators to provide explanations for their selections of correct answers. We demonstrate that by finetuning LLMs to explain their answers, we can introduce richer information into their training that guide them through the process of reasoning with conflicting contexts.

Problem

Research questions and friction points this paper is trying to address.

Addressing untruthful answers from conflicting web information in QA systems

Evaluating LLMs' limitations in handling questions with conflicting contexts

Improving LLM reasoning by finetuning with human-like explanation strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-annotated dataset QACC for conflicting contexts

Benchmarking LLMs on handling conflicting information

Finetuning LLMs with explanations to improve reasoning

🔎 Similar Papers

No similar papers found.