🤖 AI Summary
Long-term Action Quality Assessment (AQA) faces two key challenges: modeling long-range temporal dynamics is prone to spurious correlations, and existing methods lack robustness to contextual confounders—e.g., in figure skating. To address these, we propose a Causal Counterfactual Regularization module integrated with a Bidirectional Temporal Conditional Flow framework—the first to jointly leverage causal disentanglement, counterfactual intervention, and bidirectional temporal modeling. Our approach employs counterfactual interventions to isolate causal features from confounding ones, and enforces cycle-consistency constraints to enhance temporal representation stability. Crucially, it operates without frame-level annotations, relying solely on video-level quality scores for supervision. Evaluated on multiple long-term AQA benchmarks, our method achieves state-of-the-art performance, significantly outperforming prior unidirectional and annotation-intensive approaches. The code is publicly available.
📝 Abstract
Action Quality Assessment (AQA) predicts fine-grained execution scores from action videos and is widely applied in sports, rehabilitation, and skill evaluation. Long-term AQA, as in figure skating or rhythmic gymnastics, is especially challenging since it requires modeling extended temporal dynamics while remaining robust to contextual confounders. Existing approaches either depend on costly annotations or rely on unidirectional temporal modeling, making them vulnerable to spurious correlations and unstable long-term representations. To this end, we propose CaFlow, a unified framework that integrates counterfactual de-confounding with bidirectional time-conditioned flow. The Causal Counterfactual Regularization (CCR) module disentangles causal and confounding features in a self-supervised manner and enforces causal robustness through counterfactual interventions, while the BiT-Flow module models forward and backward dynamics with a cycle-consistency constraint to produce smoother and more coherent representations. Extensive experiments on multiple long-term AQA benchmarks demonstrate that CaFlow achieves state-of-the-art performance. Code is available at https://github.com/Harrison21/CaFlow