Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of factual inaccuracies and hallucinations commonly observed in abstractive summarization for low-resource languages. To mitigate these issues, the authors propose SBARThez, a novel framework that integrates multimodal and language-agnostic sentence embeddings—leveraging LaBSE, SONAR, and BGE-M3—and incorporates a named entity injection mechanism to enhance factual consistency in generated summaries. SBARThez supports both textual and spoken inputs, enabling cross-lingual abstractive summarization. Experimental results demonstrate that the proposed approach significantly improves factual accuracy and conciseness of summaries in low-resource settings, achieving performance comparable to strong token-based baseline methods.

Technology Category

Application Category

📝 Abstract
Abstractive summarization aims to generate concise summaries by creating new sentences, allowing for flexible rephrasing. However, this approach can be vulnerable to inaccuracies, particularly `hallucinations'where the model introduces non-existent information. In this paper, we leverage the use of multimodal and multilingual sentence embeddings derived from pretrained models such as LaBSE, SONAR, and BGE-M3, and feed them into a modified BART-based French model. A Named Entity Injection mechanism that appends tokenized named entities to the decoder input is introduced, in order to improve the factual consistency of the generated summary. Our novel framework, SBARThez, is applicable to both text and speech inputs and supports cross-lingual summarization; it shows competitive performance relative to token-level baselines, especially for low-resource languages, while generating more concise and abstract summaries.
Problem

Research questions and friction points this paper is trying to address.

abstractive summarization
hallucination
factual consistency
multimodal embeddings
cross-lingual summarization
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal sentence embeddings
language-agnostic representation
named entity injection
abstractive summarization
cross-lingual summarization
🔎 Similar Papers
No similar papers found.