WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

๐Ÿ“… 2025-09-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing online 3D reconstruction methods struggle to simultaneously achieve high camera pose estimation accuracy, dense and geometrically consistent point cloud quality, and real-time performance. This paper proposes a feed-forward online reconstruction framework addressing these challenges. Our method introduces three key innovations: (1) an intra-sliding-window inter-frame feature interaction mechanism to enhance geometric consistency; (2) a globally updatable camera token pool enabling cross-window pose propagation and refinement; and (3) a co-designed compact camera representation with a lightweight feed-forward network, drastically reducing computational overhead. Evaluated on multiple standard benchmarks, our approach achieves state-of-the-art online pose accuracy (12.3% reduction in absolute trajectory error), improved point cloud completeness (8.7% increase in F-score), and superior reconstruction efficiencyโ€”all while maintaining millisecond-level inference latency. The framework establishes a new paradigm for real-time SLAM and dynamic scene understanding.

Technology Category

Application Category

๐Ÿ“ Abstract
We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.
Problem

Research questions and friction points this paper is trying to address.

Addresses trade-off between reconstruction quality and real-time performance
Improves geometric predictions through sliding window information exchange
Enhances camera pose estimation reliability while maintaining efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sliding window mechanism for frame information exchange
Compact camera representation with global token pool
State-of-the-art online reconstruction quality and speed
๐Ÿ”Ž Similar Papers
No similar papers found.