๐ค AI Summary
Existing online 3D reconstruction methods struggle to simultaneously achieve high camera pose estimation accuracy, dense and geometrically consistent point cloud quality, and real-time performance. This paper proposes a feed-forward online reconstruction framework addressing these challenges. Our method introduces three key innovations: (1) an intra-sliding-window inter-frame feature interaction mechanism to enhance geometric consistency; (2) a globally updatable camera token pool enabling cross-window pose propagation and refinement; and (3) a co-designed compact camera representation with a lightweight feed-forward network, drastically reducing computational overhead. Evaluated on multiple standard benchmarks, our approach achieves state-of-the-art online pose accuracy (12.3% reduction in absolute trajectory error), improved point cloud completeness (8.7% increase in F-score), and superior reconstruction efficiencyโall while maintaining millisecond-level inference latency. The framework establishes a new paradigm for real-time SLAM and dynamic scene understanding.
๐ Abstract
We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.