🤖 AI Summary
Event cameras face challenges in pedestrian surveillance due to scarce ground-truth annotations caused by anomalous behaviors (e.g., distracted walking) and insufficient coverage of existing datasets. Method: This work introduces the first large-scale synthetic event-based human pose dataset tailored for traffic scenarios, generated via the CARLA simulator. It encompasses dynamic event streams from urban, suburban, and rural four-way intersections under diverse illumination and weather conditions, with over 350,000 precisely annotated keypoints. Contribution/Results: We present the first cross-domain transfer validation for event-domain human pose estimation, integrating models including RVT and YOLOv8. Evaluated on real-world event data, our approach significantly improves pose estimation accuracy and generalization capability in low-latency, high-dynamic-range scenarios.
📝 Abstract
Event-based sensors have emerged as a promising solution for addressing challenging conditions in pedestrian and traffic monitoring systems. Their low-latency and high dynamic range allow for improved response time in safety-critical situations caused by distracted walking or other unusual movements. However, the availability of data covering such scenarios remains limited. To address this gap, we present SEPose -- a comprehensive synthetic event-based human pose estimation dataset for fixed pedestrian perception generated using dynamic vision sensors in the CARLA simulator. With nearly 350K annotated pedestrians with body pose keypoints from the perspective of fixed traffic cameras, SEPose is a comprehensive synthetic multi-person pose estimation dataset that spans busy and light crowds and traffic across diverse lighting and weather conditions in 4-way intersections in urban, suburban, and rural environments. We train existing state-of-the-art models such as RVT and YOLOv8 on our dataset and evaluate them on real event-based data to demonstrate the sim-to-real generalization capabilities of the proposed dataset.