🤖 AI Summary
To address the semantic incompleteness of isolated sentences in zero-shot sentence decontextualization—caused by missing coreference resolution and background information—this paper proposes a two-stage content selection and planning framework. First, it identifies ambiguous constituents via semantic unit segmentation and coreference detection; second, it models discourse relations to match and rank contextual fragments, thereby generating a structured rewriting plan. The method operates purely in a zero-shot setting without fine-tuning, integrating semantic analysis with controllable generation. Experiments demonstrate significant improvements over existing zero-shot baselines across multiple benchmarks, notably enhancing semantic completeness, coherence, and readability. Crucially, this work is the first to explicitly incorporate discourse relations into the content planning phase of decontextualization, establishing a novel, interpretable, and controllable paradigm for context-agnostic inference.
📝 Abstract
Extracting individual sentences from a document as evidence or reasoning steps is commonly done in many NLP tasks. However, extracted sentences often lack context necessary to make them understood, e.g., coreference and background information. To this end, we propose a content selection and planning framework for zero-shot decontextualisation, which determines what content should be mentioned and in what order for a sentence to be understood out of context. Specifically, given a potentially ambiguous sentence and its context, we first segment it into basic semantically-independent units. We then identify potentially ambiguous units from the given sentence, and extract relevant units from the context based on their discourse relations. Finally, we generate a content plan to rewrite the sentence by enriching each ambiguous unit with its relevant units. Experimental results demonstrate that our approach is competitive for sentence decontextualisation, producing sentences that exhibit better semantic integrity and discourse coherence, outperforming existing methods.