TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first feedforward framework capable of simultaneously achieving high-fidelity 3D reconstruction and accurate camera pose estimation from multi-view images without prior pose information. The method introduces a token-aligned Gaussian prediction module, learnable camera tokens, and an asymmetric dual-stream decoder, enabling long-range cross-view reasoning in feature space through multi-scale feature fusion and a directional constraint communication strategy. Notably, this framework decouples 3D Gaussian splatting reconstruction and pose estimation within a feedforward architecture, eliminating the need for iterative optimization. Experiments demonstrate that the proposed approach significantly outperforms existing pose-free methods in terms of reconstruction fidelity, novel view synthesis quality, and pose estimation accuracy.

Technology Category

Application Category

📝 Abstract
We present TokenSplat, a feed-forward framework for joint 3D Gaussian reconstruction and camera pose estimation from unposed multi-view images. At its core, TokenSplat introduces a Token-aligned Gaussian Prediction module that aligns semantically corresponding information across views directly in the feature space. Guided by coarse token positions and fusion confidence, it aggregates multi-scale contextual features to enable long-range cross-view reasoning and reduce redundancy from overlapping Gaussians. To further enhance pose robustness and disentangle viewpoint cues from scene semantics, TokenSplat employs learnable camera tokens and an Asymmetric Dual-Flow Decoder (ADF-Decoder) that enforces directionally constrained communication between camera and image tokens. This maintains clean factorization within a feed-forward architecture, enabling coherent reconstruction and stable pose estimation without iterative refinement. Extensive experiments demonstrate that TokenSplat achieves higher reconstruction fidelity and novel-view synthesis quality in pose-free settings, and significantly improves pose estimation accuracy compared to prior pose-free methods. Project page: https://kidleyh.github.io/tokensplat/.
Problem

Research questions and friction points this paper is trying to address.

3D reconstruction
camera pose estimation
pose-free
multi-view images
Gaussian splatting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-aligned Gaussian Prediction
pose-free reconstruction
Asymmetric Dual-Flow Decoder
feed-forward 3D reconstruction
camera pose estimation
🔎 Similar Papers
No similar papers found.
Yihui Li
Yihui Li
Beihang University
C
Chengxin Lv
State Key Laboratory of Complex and Critical Software Environment, Beijing, China; School of Computer Science and Engineering, Beihang University, China
Z
Zichen Tang
State Key Laboratory of Complex and Critical Software Environment, Beijing, China; School of Artificial Intelligence, Beihang University, China
H
Hongyu Yang
School of Artificial Intelligence, Beihang University, China
Di Huang
Di Huang
Computer Science and Engineering, Beihang University
Computer VisionRepresentation LearningGenerative AIEmbodied AI