Towards Open-World Human Action Segmentation Using Graph Convolutional Networks

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of detecting and segmenting out-of-distribution (OOD) actions in open-world action segmentation, this paper proposes the first end-to-end framework. It introduces an enhanced pyramid graph convolutional network to model multi-scale spatiotemporal dependencies; a Mixup-based unlabeled anomalous action synthesis strategy to mitigate OOD sample scarcity; and a temporal clustering loss jointly optimizing action segmentation and open-set recognition. Evaluated on Bimanual Actions and H2O datasets, our method achieves significant improvements: +16.9% in open-set segmentation F1@50 and +34.6% in OOD detection AUROC. These results demonstrate strong generalization under dynamic real-world conditions and establish a novel paradigm for applications such as assistive robotics and healthcare.

Technology Category

Application Category

📝 Abstract
Human-object interaction segmentation is a fundamental task of daily activity understanding, which plays a crucial role in applications such as assistive robotics, healthcare, and autonomous systems. Most existing learning-based methods excel in closed-world action segmentation, they struggle to generalize to open-world scenarios where novel actions emerge. Collecting exhaustive action categories for training is impractical due to the dynamic diversity of human activities, necessitating models that detect and segment out-of-distribution actions without manual annotation. To address this issue, we formally define the open-world action segmentation problem and propose a structured framework for detecting and segmenting unseen actions. Our framework introduces three key innovations: 1) an Enhanced Pyramid Graph Convolutional Network (EPGCN) with a novel decoder module for robust spatiotemporal feature upsampling. 2) Mixup-based training to synthesize out-of-distribution data, eliminating reliance on manual annotations. 3) A novel Temporal Clustering loss that groups in-distribution actions while distancing out-of-distribution samples. We evaluate our framework on two challenging human-object interaction recognition datasets: Bimanual Actions and 2 Hands and Object (H2O) datasets. Experimental results demonstrate significant improvements over state-of-the-art action segmentation models across multiple open-set evaluation metrics, achieving 16.9% and 34.6% relative gains in open-set segmentation (F1@50) and out-of-distribution detection performances (AUROC), respectively. Additionally, we conduct an in-depth ablation study to assess the impact of each proposed component, identifying the optimal framework configuration for open-world action segmentation.
Problem

Research questions and friction points this paper is trying to address.

Generalize action segmentation to open-world scenarios with novel actions
Detect and segment out-of-distribution actions without manual annotation
Improve robustness in human-object interaction recognition for dynamic activities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced Pyramid Graph Convolutional Network for spatiotemporal feature upsampling
Mixup-based training to synthesize out-of-distribution data
Temporal Clustering loss for grouping and distancing actions
🔎 Similar Papers
No similar papers found.
H
Hao Xing
Institute for Cognitive Systems, School of Computation, Information and Technology, Technical University of Munich, Arcisstraße 21, 80333 Munich, Germany
K
Kai Zhe Boey
Institute for Cognitive Systems, School of Computation, Information and Technology, Technical University of Munich, Arcisstraße 21, 80333 Munich, Germany
Gordon Cheng
Gordon Cheng
Technical University of Munich
NeuroRoboticsNeuroEngineeringImitation LearningCognitive SystemsHumanoid Robotics