Off The Grid: Detection of Primitives for Feed-Forward 3D Gaussian Splatting

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Feedforward 3D Gaussian Splatting (3DGS) suffers from coarse primitive localization, low efficiency, and severe rendering artifacts due to its reliance on fixed-pixel grids. Method: We propose a sub-pixel–level adaptive primitive detection architecture. Our approach introduces a keypoint-inspired multi-resolution decoder that enables pose-agnostic, end-to-end self-supervised learning of sparse Gaussian primitive distributions. We integrate multi-scale feature decoding, self-supervised 3D reconstruction, and differentiable rasterization. Notably, we empirically discover that Gaussian rendering optimization concurrently improves camera pose estimation accuracy. Results: Our method achieves state-of-the-art performance in real-time feedforward 3DGS: novel-view synthesis completes in seconds; primitive count reduces by over 50%; rendering artifacts are significantly suppressed; and geometric detail fidelity is markedly enhanced—demonstrating superior efficiency, accuracy, and visual quality.

Technology Category

Application Category

📝 Abstract
Feed-forward 3D Gaussian Splatting (3DGS) models enable real-time scene generation but are hindered by suboptimal pixel-aligned primitive placement, which relies on a dense, rigid grid and limits both quality and efficiency. We introduce a new feed-forward architecture that detects 3D Gaussian primitives at a sub-pixel level, replacing the pixel grid with an adaptive, "Off The Grid" distribution. Inspired by keypoint detection, our multi-resolution decoder learns to distribute primitives across image patches. This module is trained end-to-end with a 3D reconstruction backbone using self-supervised learning. Our resulting pose-free model generates photorealistic scenes in seconds, achieving state-of-the-art novel view synthesis for feed-forward models. It outperforms competitors while using far fewer primitives, demonstrating a more accurate and efficient allocation that captures fine details and reduces artifacts. Moreover, we observe that by learning to render 3D Gaussians, our 3D reconstruction backbone improves camera pose estimation, suggesting opportunities to train these foundational models without labels.
Problem

Research questions and friction points this paper is trying to address.

Improves primitive placement for real-time 3D scene generation
Replaces rigid grid with adaptive sub-pixel primitive detection
Enhances quality and efficiency using fewer primitives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sub-pixel primitive detection replaces rigid grid
Multi-resolution decoder distributes primitives adaptively
Self-supervised end-to-end training improves pose estimation
🔎 Similar Papers
No similar papers found.
Arthur Moreau
Arthur Moreau
Huawei Noah's Ark London
3D computer visionneural renderingvirtual humansvisual localizationpose estimation
Richard Shaw
Richard Shaw
University College London
M
Michal Nazarczuk
Huawei Noah’s Ark Lab
J
Jisu Shin
Huawei Noah’s Ark Lab
T
Thomas Tanay
Huawei Noah’s Ark Lab
Z
Zhensong Zhang
Huawei Noah’s Ark Lab
S
Songcen Xu
Huawei Noah’s Ark Lab
Eduardo Pérez-Pellitero
Eduardo Pérez-Pellitero
Principal Research Scientist
Neural RenderingComputational PhotographyMachine Learning