Event-based evaluation of abstractive news summarization

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This study addresses the challenge of accurately measuring event information coverage in automatic news summarization evaluation. We propose a novel evaluation method based on event overlap, shifting from conventional paradigms that rely on lexical or sentence-level overlap or similarity scores. Instead, our approach treats events as the fundamental evaluation unit: it jointly extracts and semantically matches structured events from generated summaries, reference summaries, and source articles, augmented by human-annotated event labels to ensure semantic fidelity. Experiments on a Norwegian news dataset demonstrate that the method significantly improves discrimination of summary event preservation and factual accuracy, while enhancing interpretability and reliability of evaluation outcomes. The core contribution is the first principled integration of event modeling as the central semantic unit for summarization quality assessment—enabling a paradigm shift from shallow surface matching to deep semantic coverage evaluation.

Technology Category

Application Category

📝 Abstract

An abstractive summary of a news article contains its most important information in a condensed version. The evaluation of automatically generated summaries by generative language models relies heavily on human-authored summaries as gold references, by calculating overlapping units or similarity scores. News articles report events, and ideally so should the summaries. In this work, we propose to evaluate the quality of abstractive summaries by calculating overlapping events between generated summaries, reference summaries, and the original news articles. We experiment on a richly annotated Norwegian dataset comprising both events annotations and summaries authored by expert human annotators. Our approach provides more insight into the event information contained in the summaries.

Problem

Research questions and friction points this paper is trying to address.

Evaluating abstractive summaries using event overlap

Comparing generated and reference summaries with original articles

Assessing event information retention in news summaries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluate summaries using overlapping event analysis

Utilize richly annotated Norwegian dataset

Compare events in generated and reference summaries

🔎 Similar Papers

No similar papers found.