Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of detecting factual inconsistency in long-document summarization—a difficulty arising from complex document structures and lengthy summaries. We propose a discourse-aware framework for factual consistency assessment. Methodologically, we establish, for the first time, a systematic linkage between factual errors and discourse-level structural cues—including discourse connectives, rhetorical relations, and complex syntactic constructions. Leveraging discourse analysis theory, we decompose texts into semantic units and integrate discourse connective and rhetorical structure features to enhance sentence-level natural language inference (NLI) capabilities; multi-granularity aggregation then yields paragraph- and document-level consistency scores. Evaluated across multi-domain long-summarization benchmarks (news, legal, scientific), our approach significantly outperforms existing baselines, achieving up to a 12.3% absolute improvement in factual consistency classification accuracy—thereby advancing beyond conventional flat, token- or sentence-level scoring paradigms.

Technology Category

Application Category

📝 Abstract
Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. We find that errors are more common in complex sentences and are associated with several discourse features. We propose a framework that decomposes long texts into discourse-inspired chunks and utilizes discourse information to better aggregate sentence-level scores predicted by natural language inference models. Our approach shows improved performance on top of different model baselines over several evaluation benchmarks, covering rich domains of texts, focusing on long document summarization. This underscores the significance of incorporating discourse features in developing models for scoring summaries for long document factual inconsistency.
Problem

Research questions and friction points this paper is trying to address.

Detecting factual inconsistency in summaries
Analyzing discourse features for errors
Improving long document summarization models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discourse analysis for inconsistency detection
Decomposing texts into discourse chunks
Aggregating scores with discourse features