TokenSplat: Token-aligned 3D Gaussian Splatting for Feed-forward Pose-free Reconstruction

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work proposes the first feedforward framework capable of simultaneously achieving high-fidelity 3D reconstruction and accurate camera pose estimation from multi-view images without prior pose information. The method introduces a token-aligned Gaussian prediction module, learnable camera tokens, and an asymmetric dual-stream decoder, enabling long-range cross-view reasoning in feature space through multi-scale feature fusion and a directional constraint communication strategy. Notably, this framework decouples 3D Gaussian splatting reconstruction and pose estimation within a feedforward architecture, eliminating the need for iterative optimization. Experiments demonstrate that the proposed approach significantly outperforms existing pose-free methods in terms of reconstruction fidelity, novel view synthesis quality, and pose estimation accuracy.

Technology Category

Application Category

📝 Abstract

We present TokenSplat, a feed-forward framework for joint 3D Gaussian reconstruction and camera pose estimation from unposed multi-view images. At its core, TokenSplat introduces a Token-aligned Gaussian Prediction module that aligns semantically corresponding information across views directly in the feature space. Guided by coarse token positions and fusion confidence, it aggregates multi-scale contextual features to enable long-range cross-view reasoning and reduce redundancy from overlapping Gaussians. To further enhance pose robustness and disentangle viewpoint cues from scene semantics, TokenSplat employs learnable camera tokens and an Asymmetric Dual-Flow Decoder (ADF-Decoder) that enforces directionally constrained communication between camera and image tokens. This maintains clean factorization within a feed-forward architecture, enabling coherent reconstruction and stable pose estimation without iterative refinement. Extensive experiments demonstrate that TokenSplat achieves higher reconstruction fidelity and novel-view synthesis quality in pose-free settings, and significantly improves pose estimation accuracy compared to prior pose-free methods. Project page: https://kidleyh.github.io/tokensplat/.

Problem

Research questions and friction points this paper is trying to address.

3D reconstruction

camera pose estimation

pose-free

multi-view images

Gaussian splatting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-aligned Gaussian Prediction

pose-free reconstruction

Asymmetric Dual-Flow Decoder