Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address computational redundancy and temporal resolution degradation caused by converting event streams into dense frames in event-camera-based human pose estimation, this paper proposes an end-to-end point-cloud-driven approach. Instead of frame-based representations, it directly constructs spatiotemporal point clouds from raw asynchronous events. We design an event time-slicing convolution module to capture millisecond-scale short-term dependencies, introduce an event slice serialization mechanism for structured temporal modeling, and embed an edge-enhancement module into the point cloud representation to improve spatial detail perception under sparse conditions. The method is compatible with mainstream point cloud backbones—including PointNet, DGCNN, and Point Transformer. Evaluated on the DHP19 dataset, our approach significantly outperforms existing point-cloud-based baselines, achieving consistent improvements in both accuracy and inference efficiency—demonstrating the effectiveness of explicitly leveraging the spatiotemporal sparsity inherent in event streams.

Technology Category

Application Category

📝 Abstract
Human pose estimation focuses on predicting body keypoints to analyze human motion. Event cameras provide high temporal resolution and low latency, enabling robust estimation under challenging conditions. However, most existing methods convert event streams into dense event frames, which adds extra computation and sacrifices the high temporal resolution of the event signal. In this work, we aim to exploit the spatiotemporal properties of event streams based on point cloud-based framework, designed to enhance human pose estimation performance. We design Event Temporal Slicing Convolution module to capture short-term dependencies across event slices, and combine it with Event Slice Sequencing module for structured temporal modeling. We also apply edge enhancement in point cloud-based event representation to enhance spatial edge information under sparse event conditions to further improve performance. Experiments on the DHP19 dataset show our proposed method consistently improves performance across three representative point cloud backbones: PointNet, DGCNN, and Point Transformer.
Problem

Research questions and friction points this paper is trying to address.

Exploiting spatiotemporal properties for efficient event-driven human pose estimation
Designing modules to capture short-term dependencies and structured temporal modeling
Enhancing spatial edge information in point cloud representation for sparse events
Innovation

Methods, ideas, or system contributions that make the work stand out.

Event Temporal Slicing Convolution captures short-term dependencies
Event Slice Sequencing enables structured temporal modeling
Edge enhancement in point cloud representation improves spatial information
🔎 Similar Papers
No similar papers found.
H
Haoxian Zhou
School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia
Chuanzhi Xu
Chuanzhi Xu
Student, The University of Sydney
Neuromorphic VisionHigh-level VisionComputational Aesthetics
Langyi Chen
Langyi Chen
MPhil, University of Sydney
Computer VisionArtificial IntelligenceDeep learning
H
Haodong Chen
School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia
Y
Yuk Ying Chung
School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia
Qiang Qu
Qiang Qu
Professor, Chinese Academy of Sciences, Shenzhen Institutes of Advanced Technology
BlockchainData IntelligenceData-intensive SystemsData Mining
X
Xaoming Chen
School of Computer Science and Engineering, Beijing Technology and Business University, Beijing 100048, China
Weidong Cai
Weidong Cai
Clinical Associate Professor, Stanford University School of Medicine
functional neuroimagingmachine learningcognitivedevelopmentalclinical neuroscience