Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the challenge of fine-grained action localization in professional badminton videos, where distinguishing visually similar actions is hindered by complex spatiotemporal dynamics. To this end, we propose the Decoupled Spatio-Temporal Adapter (DSTA), which introduces a decoupled modeling mechanism—adopted for the first time in this task—to separately capture temporal dynamics, vertical spatial variations, and horizontal spatial changes through three parallel branches. Operating within a parameter-efficient framework, DSTA precisely models subtle motion differences with minimal computational overhead. Evaluated on both our newly curated Fine-Badminton dataset and the established ShuttleSet benchmark, the method achieves state-of-the-art performance while introducing negligible additional parameters and computation, significantly enhancing discriminability among highly similar actions.
📝 Abstract
Temporal Action Localization (TAL) has been extensively studied in generic video understanding, while fine-grained sports scenarios, such as professional badminton, remain underexplored due to their complex and subtle spatio-temporal dynamics. In this paper, we focus on fine-grained TAL in professional badminton videos and introduce a new benchmark dataset, Fine-Badminton, which consists of 31 matches with 29 fine-grained stroke categories, covering 2104 rallies and 27597 annotated actions. To effectively capture the intricate motion patterns in such scenarios, we propose a Decoupling Spatio-Temporal Adapter (DSTA), which enables efficient modeling of spatio-temporal features within a parameter-efficient framework. Specifically, DSTA decomposes motion representation into three parallel branches, capturing temporal dynamics as well as vertical and horizontal spatial variations. The design allows the model to better distinguish subtle differences among fine-grained actions. Extensive experiments on both the Fine-Badminton dataset and the ShuttleSet benchmark demonstrate that the proposed method achieves state-of-the-art performance while introducing only a marginal increase in computational and parameter cost. These results validate the effectiveness and efficiency of the proposed approach for fine-grained temporal action localization.
Problem

Research questions and friction points this paper is trying to address.

Temporal Action Localization
Fine-Grained Action Recognition
Badminton Video Analysis
Spatio-Temporal Dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupling Spatio-Temporal Adapter
Fine-Grained Action Localization
Temporal Action Localization
Parameter-Efficient Modeling
Badminton Video Analysis
🔎 Similar Papers
No similar papers found.