Improving Zero-shot Sentence Decontextualisation with Content Selection and Planning

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the semantic incompleteness of isolated sentences in zero-shot sentence decontextualization—caused by missing coreference resolution and background information—this paper proposes a two-stage content selection and planning framework. First, it identifies ambiguous constituents via semantic unit segmentation and coreference detection; second, it models discourse relations to match and rank contextual fragments, thereby generating a structured rewriting plan. The method operates purely in a zero-shot setting without fine-tuning, integrating semantic analysis with controllable generation. Experiments demonstrate significant improvements over existing zero-shot baselines across multiple benchmarks, notably enhancing semantic completeness, coherence, and readability. Crucially, this work is the first to explicitly incorporate discourse relations into the content planning phase of decontextualization, establishing a novel, interpretable, and controllable paradigm for context-agnostic inference.

Technology Category

Application Category

📝 Abstract

Extracting individual sentences from a document as evidence or reasoning steps is commonly done in many NLP tasks. However, extracted sentences often lack context necessary to make them understood, e.g., coreference and background information. To this end, we propose a content selection and planning framework for zero-shot decontextualisation, which determines what content should be mentioned and in what order for a sentence to be understood out of context. Specifically, given a potentially ambiguous sentence and its context, we first segment it into basic semantically-independent units. We then identify potentially ambiguous units from the given sentence, and extract relevant units from the context based on their discourse relations. Finally, we generate a content plan to rewrite the sentence by enriching each ambiguous unit with its relevant units. Experimental results demonstrate that our approach is competitive for sentence decontextualisation, producing sentences that exhibit better semantic integrity and discourse coherence, outperforming existing methods.

Problem

Research questions and friction points this paper is trying to address.

Extracted sentences lack context like coreference and background information

Determining what content to include and its order for understanding

Rewriting ambiguous sentences by enriching them with relevant context units

Innovation

Methods, ideas, or system contributions that make the work stand out.

Content selection framework identifies ambiguous units

Discourse relation extraction retrieves relevant context units

Content planning generates semantically coherent rewritten sentences

🔎 Similar Papers

ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models