OMG-Bench: A New Challenging Benchmark for Skeleton-based Online Micro Hand Gesture Recognition

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of public data, high cost of skeletal annotation, and poor generalizability of existing methods in online micro-gesture recognition, this paper introduces OMG-Bench—the first large-scale, fine-grained benchmark for skeleton-driven online micro-gesture recognition—comprising 40 gesture classes and 13,948 instances with precise frame-level annotations. We further propose the Hierarchical Memory-Augmented Transformer (HMATr), which jointly models temporal context and gesture localization via a hierarchical memory bank and learnable position-aware queries. Innovatively, OMG-Bench integrates multi-view self-supervised skeleton reconstruction with a heuristic-expert collaborative semi-automatic annotation pipeline to ensure annotation accuracy and scalability. Experiments demonstrate that HMATr achieves a 7.6% improvement in detection performance over state-of-the-art methods. OMG-Bench fills a critical gap by providing the first high-quality, publicly available benchmark for this task, establishing a robust foundation for future research in online micro-gesture recognition.

Technology Category

Application Category

📝 Abstract
Online micro gesture recognition from hand skeletons is critical for VR/AR interaction but faces challenges due to limited public datasets and task-specific algorithms. Micro gestures involve subtle motion patterns, which make constructing datasets with precise skeletons and frame-level annotations difficult. To this end, we develop a multi-view self-supervised pipeline to automatically generate skeleton data, complemented by heuristic rules and expert refinement for semi-automatic annotation. Based on this pipeline, we introduce OMG-Bench, the first large-scale public benchmark for skeleton-based online micro gesture recognition. It features 40 fine-grained gesture classes with 13,948 instances across 1,272 sequences, characterized by subtle motions, rapid dynamics, and continuous execution. To tackle these challenges, we propose Hierarchical Memory-Augmented Transformer (HMATr), an end-to-end framework that unifies gesture detection and classification by leveraging hierarchical memory banks which store frame-level details and window-level semantics to preserve historical context. In addition, it employs learnable position-aware queries initialized from the memory to implicitly encode gesture positions and semantics. Experiments show that HMATr outperforms state-of-the-art methods by 7.6% in detection rate, establishing a strong baseline for online micro gesture recognition. Project page: https://omg-bench.github.io/
Problem

Research questions and friction points this paper is trying to address.

Develops a benchmark for skeleton-based online micro hand gesture recognition
Proposes a framework to unify gesture detection and classification
Addresses challenges of subtle motions and limited public datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view self-supervised pipeline for skeleton generation
Hierarchical memory-augmented transformer for detection and classification
Learnable position-aware queries from memory for context encoding
🔎 Similar Papers
No similar papers found.
H
Haochen Chang
Sun Yat-sen University
Pengfei Ren
Pengfei Ren
Chinese Academy of Meteorological Sciences
Madden-Julian OscillationClimate DynamicsSubseasonal to seasonal forecasts
B
Buyuan Zhang
Shanghai Jiao Tong University
D
Da Li
Nankai University
T
Tianhao Han
Shanghai Jiao Tong University
Haoyang Zhang
Haoyang Zhang
Ph.D. student of Computer Science, University of Illinois Urbana-Champaign
Computer ArchitectureSystem Software
Liang Xie
Liang Xie
Wuhan University of Technology
Time Series ForecastingCross-modal Learning
H
Hongbo Chen
Sun Yat-sen University
E
Erwei Yin
Academy of Military Sciences, Tianjin Artificial Intelligence Innovation Center