SoccerHigh: A Benchmark Dataset for Automatic Soccer Video Summarization

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the lack of publicly available benchmark datasets for soccer video highlight generation, this paper introduces SoccerSum—the first dedicated dataset for this task—comprising 237 professional matches from La Liga, Ligue 1, and Serie A, with fully annotated shot boundaries and broadcast-quality highlight segments. We propose length-constrained evaluation metrics to enhance result comparability and practical utility. Methodologically, we integrate temporal action localization with keyframe extraction into an end-to-end deep learning framework for highlight prediction. On the test set, our approach achieves an F1 score of 0.3956. All data, source code, and baseline models are publicly released to support reproducible research and community advancement in soccer video understanding.

Technology Category

Application Category

📝 Abstract

Video summarization aims to extract key shots from longer videos to produce concise and informative summaries. One of its most common applications is in sports, where highlight reels capture the most important moments of a game, along with notable reactions and specific contextual events. Automatic summary generation can support video editors in the sports media industry by reducing the time and effort required to identify key segments. However, the lack of publicly available datasets poses a challenge in developing robust models for sports highlight generation. In this paper, we address this gap by introducing a curated dataset for soccer video summarization, designed to serve as a benchmark for the task. The dataset includes shot boundaries for 237 matches from the Spanish, French, and Italian leagues, using broadcast footage sourced from the SoccerNet dataset. Alongside the dataset, we propose a baseline model specifically designed for this task, which achieves an F1 score of 0.3956 in the test set. Furthermore, we propose a new metric constrained by the length of each target summary, enabling a more objective evaluation of the generated content. The dataset and code are available at https://ipcv.github.io/SoccerHigh/.

Problem

Research questions and friction points this paper is trying to address.

Lack of public datasets for sports video summarization

Need for benchmark in automatic soccer highlight generation

Addressing objective evaluation challenges in summary content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curated benchmark dataset for soccer videos

Baseline model achieving F1 score 0.3956

New length-constrained evaluation metric

🔎 Similar Papers

No similar papers found.