🤖 AI Summary
To address computational redundancy and temporal resolution degradation caused by converting event streams into dense frames in event-camera-based human pose estimation, this paper proposes an end-to-end point-cloud-driven approach. Instead of frame-based representations, it directly constructs spatiotemporal point clouds from raw asynchronous events. We design an event time-slicing convolution module to capture millisecond-scale short-term dependencies, introduce an event slice serialization mechanism for structured temporal modeling, and embed an edge-enhancement module into the point cloud representation to improve spatial detail perception under sparse conditions. The method is compatible with mainstream point cloud backbones—including PointNet, DGCNN, and Point Transformer. Evaluated on the DHP19 dataset, our approach significantly outperforms existing point-cloud-based baselines, achieving consistent improvements in both accuracy and inference efficiency—demonstrating the effectiveness of explicitly leveraging the spatiotemporal sparsity inherent in event streams.
📝 Abstract
Human pose estimation focuses on predicting body keypoints to analyze human motion. Event cameras provide high temporal resolution and low latency, enabling robust estimation under challenging conditions. However, most existing methods convert event streams into dense event frames, which adds extra computation and sacrifices the high temporal resolution of the event signal. In this work, we aim to exploit the spatiotemporal properties of event streams based on point cloud-based framework, designed to enhance human pose estimation performance. We design Event Temporal Slicing Convolution module to capture short-term dependencies across event slices, and combine it with Event Slice Sequencing module for structured temporal modeling. We also apply edge enhancement in point cloud-based event representation to enhance spatial edge information under sparse event conditions to further improve performance. Experiments on the DHP19 dataset show our proposed method consistently improves performance across three representative point cloud backbones: PointNet, DGCNN, and Point Transformer.