Lite Any Stereo: Efficient Zero-Shot Stereo Matching

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Addressing the challenge of simultaneously achieving model lightweighting and zero-shot generalization in stereo matching, this paper proposes the first ultra-lightweight stereo depth estimation framework. Methodologically, we design a compact yet expressive backbone network, introduce a hybrid cost aggregation module, and establish a three-stage, million-scale training strategy—simulated → synthetic → real—to enhance domain robustness. We empirically demonstrate for the first time that an ultra-light model with only 0.5M parameters and <1% FLOPs of state-of-the-art (SOTA) methods achieves superior cross-domain generalization. The framework attains SOTA performance on four major real-world benchmarks—SceneFlow, KITTI, ETH3D, and Middlebury—matching or even surpassing prior-free heavy models in accuracy. This breakthrough decisively overcomes the conventional trade-off between model efficiency and generalization capability.

Technology Category

Application Category

📝 Abstract

Recent advances in stereo matching have focused on accuracy, often at the cost of significantly increased model size. Traditionally, the community has regarded efficient models as incapable of zero-shot ability due to their limited capacity. In this paper, we introduce Lite Any Stereo, a stereo depth estimation framework that achieves strong zero-shot generalization while remaining highly efficient. To this end, we design a compact yet expressive backbone to ensure scalability, along with a carefully crafted hybrid cost aggregation module. We further propose a three-stage training strategy on million-scale data to effectively bridge the sim-to-real gap. Together, these components demonstrate that an ultra-light model can deliver strong generalization, ranking 1st across four widely used real-world benchmarks. Remarkably, our model attains accuracy comparable to or exceeding state-of-the-art non-prior-based accurate methods while requiring less than 1% computational cost, setting a new standard for efficient stereo matching.

Problem

Research questions and friction points this paper is trying to address.

Achieving zero-shot stereo matching with ultra-lightweight model architecture

Bridging simulation-to-reality gap through multi-stage training strategy

Maintaining high accuracy while reducing computational cost by 99%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compact expressive backbone for scalability

Hybrid cost aggregation module design

Three-stage training on million-scale data

🔎 Similar Papers

No similar papers found.