AdaSpot: Spend Resolution Where It Matters for Precise Event Spotting

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the inefficiency and detail loss in existing video event localization methods, which uniformly process all frames and ignore spatiotemporal redundancy, often sacrificing critical spatial details through downsampling. To overcome these limitations, we propose AdaSpot, a novel framework that combines low-resolution global feature extraction with adaptive high-resolution focusing on informative local regions, thereby achieving both computational efficiency and fine-grained localization while preserving spatiotemporal consistency. The key innovation lies in an unsupervised, task-aware region selection mechanism that circumvents the training instability associated with learnable attention strategies. Experiments demonstrate that AdaSpot achieves state-of-the-art performance, improving mAP@0 by 3.96 and 2.26 on the Tennis and FineDiving datasets, respectively, while significantly reducing computational overhead.

Technology Category

Application Category

📝 Abstract

Precise Event Spotting aims to localize fast-paced actions or events in videos with high temporal precision, a key task for applications in sports analytics, robotics, and autonomous systems. Existing methods typically process all frames uniformly, overlooking the inherent spatio-temporal redundancy in video data. This leads to redundant computation on non-informative regions while limiting overall efficiency. To remain tractable, they often spatially downsample inputs, losing fine-grained details crucial for precise localization. To address these limitations, we propose \textbf{AdaSpot}, a simple yet effective framework that processes low-resolution videos to extract global task-relevant features while adaptively selecting the most informative region-of-interest in each frame for high-resolution processing. The selection is performed via an unsupervised, task-aware strategy that maintains spatio-temporal consistency across frames and avoids the training instability of learnable alternatives. This design preserves essential fine-grained visual cues with a marginal computational overhead compared to low-resolution-only baselines, while remaining far more efficient than uniform high-resolution processing. Experiments on standard PES benchmarks demonstrate that \textbf{AdaSpot} achieves state-of-the-art performance under strict evaluation metrics (\eg, $+3.96$ and $+2.26$ mAP$@0$ frames on Tennis and FineDiving), while also maintaining strong results under looser metrics. Code is available at: \href{https://github.com/arturxe2/AdaSpot}{https://github.com/arturxe2/AdaSpot}.

Problem

Research questions and friction points this paper is trying to address.

Precise Event Spotting

Temporal Localization

Spatio-temporal Redundancy

Computational Efficiency

Fine-grained Details

Innovation

Methods, ideas, or system contributions that make the work stand out.

AdaSpot

adaptive region selection

precise event spotting