🤖 AI Summary
To address accuracy degradation in dynamic indoor layout estimation caused by motion velocity and illumination variations, this paper introduces Ev-Layout—the first large-scale event-driven multimodal indoor layout dataset—comprising 2.5K motion sequences, 771K RGB images, 10 billion asynchronous event points, and synchronized IMU and ambient illumination time-series data. Methodologically, we propose a novel event-temporal distribution feature module and a plug-and-play spatiotemporal feature fusion module, enabling, for the first time, efficient heterogeneous co-modeling of event streams, RGB frames, IMU signals, and illumination data within a Transformer architecture. Evaluated on the Ev-Layout benchmark, our approach achieves an 18.7% improvement in layout estimation accuracy over state-of-the-art event-based methods, demonstrating the critical benefit of multimodal temporal协同 modeling for dynamic indoor scenes.
📝 Abstract
This paper presents Ev-Layout, a novel large-scale event-based multi-modal dataset designed for indoor layout estimation and tracking. Ev-Layout makes key contributions to the community by: Utilizing a hybrid data collection platform (with a head-mounted display and VR interface) that integrates both RGB and bio-inspired event cameras to capture indoor layouts in motion. Incorporating time-series data from inertial measurement units (IMUs) and ambient lighting conditions recorded during data collection to highlight the potential impact of motion speed and lighting on layout estimation accuracy. The dataset consists of 2.5K sequences, including over 771.3K RGB images and 10 billion event data points. Of these, 39K images are annotated with indoor layouts, enabling research in both event-based and video-based indoor layout estimation. Based on the dataset, we propose an event-based layout estimation pipeline with a novel event-temporal distribution feature module to effectively aggregate the spatio-temporal information from events. Additionally, we introduce a spatio-temporal feature fusion module that can be easily integrated into a transformer module for fusion purposes. Finally, we conduct benchmarking and extensive experiments on the Ev-Layout dataset, demonstrating that our approach significantly improves the accuracy of dynamic indoor layout estimation compared to existing event-based methods.