Towards Open-World Human Action Segmentation Using Graph Convolutional Networks

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Addressing the challenge of detecting and segmenting out-of-distribution (OOD) actions in open-world action segmentation, this paper proposes the first end-to-end framework. It introduces an enhanced pyramid graph convolutional network to model multi-scale spatiotemporal dependencies; a Mixup-based unlabeled anomalous action synthesis strategy to mitigate OOD sample scarcity; and a temporal clustering loss jointly optimizing action segmentation and open-set recognition. Evaluated on Bimanual Actions and H2O datasets, our method achieves significant improvements: +16.9% in open-set segmentation F1@50 and +34.6% in OOD detection AUROC. These results demonstrate strong generalization under dynamic real-world conditions and establish a novel paradigm for applications such as assistive robotics and healthcare.

Technology Category

Application Category

📝 Abstract

Human-object interaction segmentation is a fundamental task of daily activity understanding, which plays a crucial role in applications such as assistive robotics, healthcare, and autonomous systems. Most existing learning-based methods excel in closed-world action segmentation, they struggle to generalize to open-world scenarios where novel actions emerge. Collecting exhaustive action categories for training is impractical due to the dynamic diversity of human activities, necessitating models that detect and segment out-of-distribution actions without manual annotation. To address this issue, we formally define the open-world action segmentation problem and propose a structured framework for detecting and segmenting unseen actions. Our framework introduces three key innovations: 1) an Enhanced Pyramid Graph Convolutional Network (EPGCN) with a novel decoder module for robust spatiotemporal feature upsampling. 2) Mixup-based training to synthesize out-of-distribution data, eliminating reliance on manual annotations. 3) A novel Temporal Clustering loss that groups in-distribution actions while distancing out-of-distribution samples. We evaluate our framework on two challenging human-object interaction recognition datasets: Bimanual Actions and 2 Hands and Object (H2O) datasets. Experimental results demonstrate significant improvements over state-of-the-art action segmentation models across multiple open-set evaluation metrics, achieving 16.9% and 34.6% relative gains in open-set segmentation (F1@50) and out-of-distribution detection performances (AUROC), respectively. Additionally, we conduct an in-depth ablation study to assess the impact of each proposed component, identifying the optimal framework configuration for open-world action segmentation.

Problem

Research questions and friction points this paper is trying to address.

Generalize action segmentation to open-world scenarios with novel actions

Detect and segment out-of-distribution actions without manual annotation

Improve robustness in human-object interaction recognition for dynamic activities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced Pyramid Graph Convolutional Network for spatiotemporal feature upsampling

Mixup-based training to synthesize out-of-distribution data

Temporal Clustering loss for grouping and distancing actions

🔎 Similar Papers

No similar papers found.

Bosch Group

ARENA2036 in Stuttgart

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)