eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation

πŸ“… 2026-03-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of robust indoor navigation for mobile robots under low-light and fast-motion conditions, where conventional RGB cameras often fail and effective end-to-end control methods leveraging event cameras remain scarce. The authors introduce a novel real-world indoor follow-the-person dataset, synchronously capturing event streams, RGB images, and expert control commands. They propose a late-fusion RGB-event navigation strategy based on behavioral cloning, employing dual MobileNet encoders and a Transformer-based fusion module for multimodal imitation learning. This study presents the first demonstration of event camera–based end-to-end navigation in real low-light indoor environments. The proposed method significantly outperforms RGB-only baselines in unseen scenes, achieving lower action prediction error and confirming the critical role of event data in enhancing policy robustness and environmental adaptability.

Technology Category

Application Category

πŸ“ Abstract
Event cameras provide high dynamic range and microsecond-level temporal resolution, making them well-suited for indoor robot navigation, where conventional RGB cameras degrade under fast motion or low-light conditions. Despite advances in event-based perception spanning detection, SLAM, and pose estimation, there remains limited research on end-to-end control policies that exploit the asynchronous nature of event streams. To address this gap, we introduce a real-world indoor person-following dataset collected using a TurtleBot 2 robot, featuring synchronized raw event streams, RGB frames, and expert control actions across multiple indoor maps, trajectories under both normal and low-light conditions. We further build a multimodal data preprocessing pipeline that temporally aligns event and RGB observations while reconstructing ground-truth actions from odometry to support high-quality imitation learning. Building on this dataset, we propose a late-fusion RGB-Event navigation policy that combines dual MobileNet encoders with a transformer-based fusion module trained via behavioral cloning. A systematic evaluation of RGB-only, Event-only, and RGB-Event fusion models across 12 training variations ranging from single-path imitation to general multi-path imitation shows that policies incorporating event data, particularly the fusion model, achieve improved robustness and lower action prediction error, especially in unseen low-light conditions where RGB-only models fail. We release the dataset, synchronization pipeline, and trained models at https://eventbasedvision.github.io/eNavi/
Problem

Research questions and friction points this paper is trying to address.

event camera
low-light navigation
imitation learning
mobile robot
end-to-end control
Innovation

Methods, ideas, or system contributions that make the work stand out.

event camera
imitation learning
multimodal fusion
low-light navigation
behavioral cloning
πŸ”Ž Similar Papers
No similar papers found.
P
Prithvi Jai Ramesh
School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA
Kaustav Chanda
Kaustav Chanda
Arizona State University
Computer VisionImage ProcessingMachine LearningEvent Cameras
K
Krishna Vinod
School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA
J
Joseph Raj Vishal
School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA
Y
Yezhou Yang
School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA
Bharatesh Chakravarthi
Bharatesh Chakravarthi
School of Computing and AI, Arizona State University
Event-based VisionITSHuman Pose Estimation