Benchmarking Abstractive Summarisation: A Dataset of Human-authored Summaries of Norwegian News Articles

📅 2025-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses Norwegian news summarization by introducing NordicSumm, the first high-quality, dual-variant (Bokmål/Nynorsk) human-written summarization benchmark, featuring three expert-authored reference summaries per source article. Methodologically, we employ rigorous annotation guidelines, multi-round expert validation, and a human blind evaluation protocol to systematically assess the abstractive summarization capabilities of state-of-the-art Norwegian LLMs under zero-shot and fine-tuned settings. Key contributions are: (1) the release of the first multi-reference, dual-variant, human-annotated Norwegian summarization benchmark; (2) the first human evaluation revealing substantial gaps between current models and human performance in coherence, information coverage, and linguistic naturalness; and (3) empirical evidence that NordicSumm poses a meaningful challenge to existing Norwegian LLMs, establishing a robust and reliable evaluation standard for future research.

Technology Category

Application Category

📝 Abstract
We introduce a dataset of high-quality human-authored summaries of news articles in Norwegian. The dataset is intended for benchmarking the abstractive summarisation capabilities of generative language models. Each document in the dataset is provided with three different candidate gold-standard summaries written by native Norwegian speakers, and all summaries are provided in both of the written variants of Norwegian -- Bokm{aa}l and Nynorsk. The paper describes details on the data creation effort as well as an evaluation of existing open LLMs for Norwegian on the dataset. We also provide insights from a manual human evaluation, comparing human-authored to model-generated summaries. Our results indicate that the dataset provides a challenging LLM benchmark for Norwegian summarisation capabilities
Problem

Research questions and friction points this paper is trying to address.

Generative Language Models
Norwegian News Summarization
Performance Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Norwegian News Summary Dataset
Bokmal Nynorsk Bilingual
Generative Language Model Evaluation
🔎 Similar Papers
No similar papers found.