🤖 AI Summary
This work proposes the first feedforward framework capable of simultaneously achieving high-fidelity 3D reconstruction and accurate camera pose estimation from multi-view images without prior pose information. The method introduces a token-aligned Gaussian prediction module, learnable camera tokens, and an asymmetric dual-stream decoder, enabling long-range cross-view reasoning in feature space through multi-scale feature fusion and a directional constraint communication strategy. Notably, this framework decouples 3D Gaussian splatting reconstruction and pose estimation within a feedforward architecture, eliminating the need for iterative optimization. Experiments demonstrate that the proposed approach significantly outperforms existing pose-free methods in terms of reconstruction fidelity, novel view synthesis quality, and pose estimation accuracy.
📝 Abstract
We present TokenSplat, a feed-forward framework for joint 3D Gaussian reconstruction and camera pose estimation from unposed multi-view images. At its core, TokenSplat introduces a Token-aligned Gaussian Prediction module that aligns semantically corresponding information across views directly in the feature space. Guided by coarse token positions and fusion confidence, it aggregates multi-scale contextual features to enable long-range cross-view reasoning and reduce redundancy from overlapping Gaussians. To further enhance pose robustness and disentangle viewpoint cues from scene semantics, TokenSplat employs learnable camera tokens and an Asymmetric Dual-Flow Decoder (ADF-Decoder) that enforces directionally constrained communication between camera and image tokens. This maintains clean factorization within a feed-forward architecture, enabling coherent reconstruction and stable pose estimation without iterative refinement. Extensive experiments demonstrate that TokenSplat achieves higher reconstruction fidelity and novel-view synthesis quality in pose-free settings, and significantly improves pose estimation accuracy compared to prior pose-free methods. Project page: https://kidleyh.github.io/tokensplat/.