WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing online 3D reconstruction methods struggle to simultaneously achieve high camera pose estimation accuracy, dense and geometrically consistent point cloud quality, and real-time performance. This paper proposes a feed-forward online reconstruction framework addressing these challenges. Our method introduces three key innovations: (1) an intra-sliding-window inter-frame feature interaction mechanism to enhance geometric consistency; (2) a globally updatable camera token pool enabling cross-window pose propagation and refinement; and (3) a co-designed compact camera representation with a lightweight feed-forward network, drastically reducing computational overhead. Evaluated on multiple standard benchmarks, our approach achieves state-of-the-art online pose accuracy (12.3% reduction in absolute trajectory error), improved point cloud completeness (8.7% increase in F-score), and superior reconstruction efficiency—all while maintaining millisecond-level inference latency. The framework establishes a new paradigm for real-time SLAM and dynamic scene understanding.

Technology Category

Application Category

📝 Abstract

We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.

Problem

Research questions and friction points this paper is trying to address.

Addresses trade-off between reconstruction quality and real-time performance

Improves geometric predictions through sliding window information exchange

Enhances camera pose estimation reliability while maintaining efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sliding window mechanism for frame information exchange

Compact camera representation with global token pool

State-of-the-art online reconstruction quality and speed

🔎 Similar Papers

No similar papers found.