$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

To address the limitations of local stereo matching methods—lacking global consistency—and global methods—suffering from excessive computational cost—this paper proposes an efficient, fine-tuning-free depth estimation model that generalizes across resolutions and disparity ranges. The method introduces a multi-resolution Transformer architecture incorporating sparse attention to significantly reduce memory consumption; a probabilistic focusing loss that jointly optimizes disparity, occlusion, and confidence maps; and eliminates both cost volume filtering and deep refinement networks. Evaluated on Middlebury v3 and ETH3D benchmarks, the model achieves state-of-the-art accuracy—substantially outperforming existing approaches—while maintaining high computational efficiency and superior detail reconstruction capability.

Technology Category

Application Category

📝 Abstract

The pursuit of a generalizable stereo matching model, capable of performing across varying resolutions and disparity ranges without dataset-specific fine-tuning, has revealed a fundamental trade-off. Iterative local search methods achieve high scores on constrained benchmarks, but their core mechanism inherently limits the global consistency required for true generalization. On the other hand, global matching architectures, while theoretically more robust, have been historically rendered infeasible by prohibitive computational and memory costs. We resolve this dilemma with $S^2M^2$: a global matching architecture that achieves both state-of-the-art accuracy and high efficiency without relying on cost volume filtering or deep refinement stacks. Our design integrates a multi-resolution transformer for robust long-range correspondence, trained with a novel loss function that concentrates probability on feasible matches. This approach enables a more robust joint estimation of disparity, occlusion, and confidence. $S^2M^2$ establishes a new state of the art on the Middlebury v3 and ETH3D benchmarks, significantly outperforming prior methods across most metrics while reconstructing high-quality details with competitive efficiency.

Problem

Research questions and friction points this paper is trying to address.

Balancing generalization and efficiency in stereo matching models

Overcoming computational limits of global matching architectures

Improving disparity, occlusion, and confidence estimation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Global matching architecture without cost volume filtering

Multi-resolution transformer for long-range correspondence

Novel loss function concentrating on feasible matches

🔎 Similar Papers

No similar papers found.