Learning Action Hierarchies via Hybrid Geometric Diffusion

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing temporal action segmentation methods struggle to explicitly model the hierarchical structure inherent in human activities. To address this limitation, this work proposes HybridTAS, a novel framework that introduces hyperbolic geometry into the denoising process of diffusion models for the first time. By fusing representations from both Euclidean and hyperbolic spaces, HybridTAS progressively refines segmentation outputs during denoising—from coarse-grained high-level categories to fine-grained specific actions—thereby explicitly capturing hierarchical dependencies among actions. The method achieves state-of-the-art performance on three standard benchmarks: GTE-A, 50Salads, and Breakfast, demonstrating the effectiveness and advantages of hyperbolic-guided denoising for temporal action segmentation.

Technology Category

Application Category

📝 Abstract
Temporal action segmentation is a critical task in video understanding, where the goal is to assign action labels to each frame in a video. While recent advances leverage iterative refinement-based strategies, they fail to explicitly utilize the hierarchical nature of human actions. In this work, we propose HybridTAS - a novel framework that incorporates a hybrid of Euclidean and hyperbolic geometries into the denoising process of diffusion models to exploit the hierarchical structure of actions. Hyperbolic geometry naturally provides tree-like relationships between embeddings, enabling us to guide the action label denoising process in a coarse-to-fine manner: higher diffusion timesteps are influenced by abstract, high-level action categories (root nodes), while lower timesteps are refined using fine-grained action classes (leaf nodes). Extensive experiments on three benchmark datasets, GTEA, 50Salads, and Breakfast, demonstrate that our method achieves state-of-the-art performance, validating the effectiveness of hyperbolic-guided denoising for the temporal action segmentation task.
Problem

Research questions and friction points this paper is trying to address.

temporal action segmentation
action hierarchies
video understanding
hierarchical structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid geometry
hyperbolic diffusion
temporal action segmentation
action hierarchy
denoising diffusion models
🔎 Similar Papers
No similar papers found.
Arjun Ramesh Kaushik
Arjun Ramesh Kaushik
University at Buffalo, The State University of New York
Computer VisionMultimodal AIPrivacy
N
N. Ratha
University at Buffalo, SUNY
V
V. Govindaraju
University at Buffalo, SUNY