🤖 AI Summary
Stereo matching models trained on synthetic data suffer from severe performance degradation in real-world scenes due to domain gaps in color, illumination, and texture. To address this, we propose an uncertainty-guided statistical domain augmentation method: for the first time, we treat RGB channel means and standard deviations as learnable domain features; model their perturbation direction and magnitude via batch-level Gaussian sampling to capture uncertainty; and enforce feature consistency between original and augmented image pairs to achieve structural-aware, path-invariant cross-domain representation learning. Our approach is lightweight and generic—requiring no modification to the backbone architecture and enabling plug-and-play integration. Extensive experiments on multiple cross-domain benchmarks—including SceneFlow→KITTI and Driving→ETH3D—demonstrate significant improvements over state-of-the-art methods, validating both effectiveness and robustness.
📝 Abstract
State-of-the-art stereo matching (SM) models trained on synthetic data often fail to generalize to real data domains due to domain differences, such as color, illumination, contrast, and texture. To address this challenge, we leverage data augmentation to expand the training domain, encouraging the model to acquire robust cross-domain feature representations instead of domain-dependent shortcuts. This paper proposes an uncertainty-guided data augmentation (UgDA) method, which argues that the image statistics in RGB space (mean and standard deviation) carry the domain characteristics. Thus, samples in unseen domains can be generated by properly perturbing these statistics. Furthermore, to simulate more potential domains, Gaussian distributions founded on batch-level statistics are poposed to model the unceratinty of perturbation direction and intensity. Additionally, we further enforce feature consistency between original and augmented data for the same scene, encouraging the model to learn structure aware, shortcuts-invariant feature representations. Our approach is simple, architecture-agnostic, and can be integrated into any SM networks. Extensive experiments on several challenging benchmarks have demonstrated that our method can significantly improve the generalization performance of existing SM networks.