🤖 AI Summary
This work addresses the challenge of training event-based stereo matching models, which typically rely on expensive active sensors to obtain ground-truth annotations. To circumvent this dependency, the authors propose a general training framework that eliminates the need for active sensors. By constructing an EventHub data factory, they leverage standard RGB images and advanced novel-view synthesis techniques to generate proxy event data along with corresponding stereo labels, enabling effective supervision of event stereo networks. This approach yields, for the first time, a high-generalization event stereo model without access to real ground-truth event data. Moreover, the learned representations can be leveraged to enhance the performance of conventional RGB-based stereo models under challenging low-light conditions such as nighttime. Extensive experiments across multiple established event stereo benchmarks demonstrate the method’s effectiveness and superiority.
📝 Abstract
We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art novel view synthesis techniques, or simply proxy annotations when images are already paired with event data. Using the training set generated by our data factory, we repurpose state-of-the-art stereo models from RGB literature to process event data, obtaining new event stereo models with unprecedented generalization capabilities. Experiments on widely used event stereo datasets support the effectiveness of EventHub and show how the same data distillation mechanism can improve the accuracy of RGB stereo foundation models in challenging conditions such as nighttime scenes.