🤖 AI Summary
This work addresses the long-standing decoupling between optical flow and intensity estimation in event cameras, proposing the first unsupervised joint estimation framework. Methodologically, it derives an event photometric error grounded in the event camera’s physical model and integrates contrast maximization into a unified loss function, explicitly modeling the intrinsic coupling between motion and appearance. Compared to conventional decoupled paradigms, the approach achieves state-of-the-art performance in unsupervised optical flow estimation—reducing end-point error (EPE) by 20% and angular error (AE) by 25%. It also significantly improves intensity reconstruction quality under high-dynamic-range conditions. Furthermore, the method attains superior inference speed relative to all single-task optical flow models and most intensity reconstruction models.
📝 Abstract
Event cameras rely on motion to obtain information about scene appearance. In other words, for event cameras, motion and appearance are seen both or neither, which are encoded in the output event stream. Previous works consider recovering these two visual quantities as separate tasks, which does not fit with the nature of event cameras and neglects the inherent relations between both tasks. In this paper, we propose an unsupervised learning framework that jointly estimates optical flow (motion) and image intensity (appearance), with a single network. Starting from the event generation model, we newly derive the event-based photometric error as a function of optical flow and image intensity, which is further combined with the contrast maximization framework, yielding a comprehensive loss function that provides proper constraints for both flow and intensity estimation. Exhaustive experiments show that our model achieves state-of-the-art performance for both optical flow (achieves 20% and 25% improvement in EPE and AE respectively in the unsupervised learning category) and intensity estimation (produces competitive results with other baselines, particularly in high dynamic range scenarios). Last but not least, our model achieves shorter inference time than all the other optical flow models and many of the image reconstruction models, while they output only one quantity. Project page: https://github.com/tub-rip/e2fai