PHI: Bridging Domain Shift in Long-Term Action Quality Assessment via Progressive Hierarchical Instruction

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In long-term action quality assessment (AQA), pre-trained action recognition backbones suffer from both task-level and feature-level domain shifts relative to the AQA objective—particularly severe feature-level misalignment hinders effective fine-tuning of large models under few-shot settings. To address this, we propose the Progressive Hierarchical Instruction (PHI) framework. PHI mitigates domain shift without fine-tuning the pre-trained backbone: it employs a Gap Minimization Flow (GMF) for coarse-grained cross-domain alignment and introduces Listwise Contrastive Regularization (LCR) to enforce fine-grained ranking consistency. Integrating flow matching, temporal-augmented attention, and listwise contrastive learning, PHI achieves state-of-the-art performance on three mainstream long-term AQA benchmarks. It significantly alleviates domain shift, improves accuracy, and enhances generalization—especially in data-scarce regimes—while preserving backbone integrity and reducing computational overhead.

Technology Category

Application Category

📝 Abstract
Long-term Action Quality Assessment (AQA) aims to evaluate the quantitative performance of actions in long videos. However, existing methods face challenges due to domain shifts between the pre-trained large-scale action recognition backbones and the specific AQA task, thereby hindering their performance. This arises since fine-tuning resource-intensive backbones on small AQA datasets is impractical. We address this by identifying two levels of domain shift: task-level, regarding differences in task objectives, and feature-level, regarding differences in important features. For feature-level shifts, which are more detrimental, we propose Progressive Hierarchical Instruction (PHI) with two strategies. First, Gap Minimization Flow (GMF) leverages flow matching to progressively learn a fast flow path that reduces the domain gap between initial and desired features across shallow to deep layers. Additionally, a temporally-enhanced attention module captures long-range dependencies essential for AQA. Second, List-wise Contrastive Regularization (LCR) facilitates coarse-to-fine alignment by comprehensively comparing batch pairs to learn fine-grained cues while mitigating domain shift. Integrating these modules, PHI offers an effective solution. Experiments demonstrate that PHI achieves state-of-the-art performance on three representative long-term AQA datasets, proving its superiority in addressing the domain shift for long-term AQA.
Problem

Research questions and friction points this paper is trying to address.

Addressing domain shift in long-term action quality assessment
Reducing feature-level gaps between pre-trained models and AQA tasks
Enhancing performance on small datasets without fine-tuning backbones
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Hierarchical Instruction (PHI) reduces domain shift
Gap Minimization Flow (GMF) learns fast flow path
List-wise Contrastive Regularization (LCR) enables fine-grained alignment
🔎 Similar Papers
No similar papers found.
Kanglei Zhou
Kanglei Zhou
Beihang University
Computer VisionVirtual/Augmented Reality
Hubert P. H. Shum
Hubert P. H. Shum
Professor of Visual Computing, Director of Research in Computer Science, Durham University
Responsible AIComputer VisionComputer GraphicsAI in Healthcare
F
Frederick W. B. Li
Department of Computer Science, Durham University, DH1 3LE Durham, U.K.
X
Xingxing Zhang
Department of Computer Science and Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University
Xiaohui Liang
Xiaohui Liang
University of Massachusetts Boston
Mobile HealthcareVoice TechnologyInternet of ThingsPrivacy