RoSe: Robust Self-supervised Stereo Matching under Adverse Weather Conditions

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised stereo matching methods suffer severe performance degradation under adverse weather conditions—such as nighttime, rain, and fog—primarily because CNN-based feature extractors struggle to model degraded regions (e.g., reflections, textureless areas), and the photometric consistency assumption breaks down. To address this, we propose WeatherStereo: (1) We inject robust semantic priors from vision foundation models to enhance feature representation; (2) we construct scene correspondence priors to synthesize semantically and disparity-consistent adverse-weather data; and (3) we design a two-stage self-supervised training paradigm that jointly incorporates weather-degradation distillation, CNN feature enhancement, and decoupled photometric consistency modeling. Extensive experiments on multiple adverse-weather benchmarks demonstrate that WeatherStereo significantly outperforms state-of-the-art methods, achieving superior generalization and robustness across diverse weather conditions.

Technology Category

Application Category

📝 Abstract
Recent self-supervised stereo matching methods have made significant progress, but their performance significantly degrades under adverse weather conditions such as night, rain, and fog. We identify two primary weaknesses contributing to this performance degradation. First, adverse weather introduces noise and reduces visibility, making CNN-based feature extractors struggle with degraded regions like reflective and textureless areas. Second, these degraded regions can disrupt accurate pixel correspondences, leading to ineffective supervision based on the photometric consistency assumption. To address these challenges, we propose injecting robust priors derived from the visual foundation model into the CNN-based feature extractor to improve feature representation under adverse weather conditions. We then introduce scene correspondence priors to construct robust supervisory signals rather than relying solely on the photometric consistency assumption. Specifically, we create synthetic stereo datasets with realistic weather degradations. These datasets feature clear and adverse image pairs that maintain the same semantic context and disparity, preserving the scene correspondence property. With this knowledge, we propose a robust self-supervised training paradigm, consisting of two key steps: robust self-supervised scene correspondence learning and adverse weather distillation. Both steps aim to align underlying scene results from clean and adverse image pairs, thus improving model disparity estimation under adverse weather effects. Extensive experiments demonstrate the effectiveness and versatility of our proposed solution, which outperforms existing state-of-the-art self-supervised methods. Codes are available at extcolor{blue}{https://github.com/cocowy1/RoSe-Robust-Self-supervised-Stereo-Matching-under-Adverse-Weather-Conditions}.
Problem

Research questions and friction points this paper is trying to address.

Improving stereo matching performance under adverse weather conditions like night, rain, and fog
Addressing CNN feature extractor limitations with degraded regions in bad weather
Developing robust supervision beyond photometric consistency for weather-affected images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Injecting visual foundation model priors into CNN feature extractors
Using scene correspondence priors instead of photometric consistency
Creating synthetic stereo datasets with realistic weather degradations
🔎 Similar Papers
No similar papers found.
Y
Yun Wang
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
J
Junjie Hu
Chinese University of Hong Kong, Shenzhen, China
Junhui Hou
Junhui Hou
Department of Computer Science, City University of Hong Kong
Neural Spatial Computing
Chenghao Zhang
Chenghao Zhang
Renmin University of China
Natural Language ProcessingInformation RetrievalMultimodal
R
Renwei Yang
Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, China
Dapeng Oliver Wu
Dapeng Oliver Wu
City University of Hong Kong
machine learningcommunicationsvideo codingsignal processingcomputer vision