Sentence Embeddings as an intermediate target in end-to-end summarisation

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Long user reviews pose challenges for content selection in summarization, while end-to-end models suffer from insufficient coherence and information preservation under weakly aligned training corpora. Method: This paper proposes an embedding-guided extractive-abstractive summarization framework that leverages pretrained sentence embeddings (e.g., SBERT) as structured intermediate supervision—replacing conventional sentence selection probability prediction—and jointly optimizes the extractive sentence selector and abstractive sequence-to-sequence model (T5/BART) via embedding-space regression loss. Results: On a hotel review summarization dataset, our method achieves a 2.3-point ROUGE-L improvement over the state of the art; human evaluation confirms significant gains in summary relevance and fluency. The core contribution lies in introducing sentence embeddings as intermediate supervision, effectively mitigating the weak-alignment challenge inherent in long-input summarization.

Technology Category

Application Category

📝 Abstract

Current neural network-based methods to the problem of document summarisation struggle when applied to datasets containing large inputs. In this paper we propose a new approach to the challenge of content-selection when dealing with end-to-end summarisation of user reviews of accommodations. We show that by combining an extractive approach with externally pre-trained sentence level embeddings in an addition to an abstractive summarisation model we can outperform existing methods when this is applied to the task of summarising a large input dataset. We also prove that predicting sentence level embedding of a summary increases the quality of an end-to-end system for loosely aligned source to target corpora, than compared to commonly predicting probability distributions of sentence selection.

Problem

Research questions and friction points this paper is trying to address.

Addressing large input challenges in document summarization

Improving content-selection for end-to-end review summarization

Enhancing summary quality via sentence embedding prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines extractive and abstractive summarization models

Uses pre-trained sentence level embeddings

Predicts sentence embeddings to enhance quality

🔎 Similar Papers

No similar papers found.