🤖 AI Summary
Existing one-class, single-model action anomaly detection methods suffer from poor scalability and heavy reliance on large volumes of normal training samples. To address these limitations, this paper proposes a few-shot generic contrastive learning framework that constructs a class-agnostic unified representation space, enabling cross-category anomaly detection using only a small support set of normal action sequences. A key innovation is the integration of a generative motion augmentation strategy grounded in diffusion-based foundation models, which synthesizes diverse, high-fidelity normal action sequences to enhance intra-class robustness and cross-category generalization. Evaluated on the HumanAct12 benchmark, our method achieves state-of-the-art performance under both seen and unseen category settings, while significantly reducing computational and data requirements. This enables rapid adaptation to novel action categories and effective deployment in data-scarce scenarios.
📝 Abstract
Human Action Anomaly Detection (HAAD) aims to identify anomalous actions given only normal action data during training. Existing methods typically follow a one-model-per-category paradigm, requiring separate training for each action category and a large number of normal samples. These constraints hinder scalability and limit applicability in real-world scenarios, where data is often scarce or novel categories frequently appear. To address these limitations, we propose a unified framework for HAAD that is compatible with few-shot scenarios. Our method constructs a category-agnostic representation space via contrastive learning, enabling AD by comparing test samples with a given small set of normal examples (referred to as the support set). To improve inter-category generalization and intra-category robustness, we introduce a generative motion augmentation strategy harnessing a diffusion-based foundation model for creating diverse and realistic training samples. Notably, to the best of our knowledge, our work is the first to introduce such a strategy specifically tailored to enhance contrastive learning for action AD. Extensive experiments on the HumanAct12 dataset demonstrate the state-of-the-art effectiveness of our approach under both seen and unseen category settings, regarding training efficiency and model scalability for few-shot HAAD.