NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of holistic 3D scene reconstruction from pose-free images without pixel-wise alignment, a setting where existing pixel-aligned approaches struggle with occlusions and overlapping structures. The authors propose an end-to-end, view-agnostic reconstruction method built upon a Vision Transformer architecture, which introduces a scene token mechanism to effectively aggregate multi-view information. A diffusion-based 3D decoder is further integrated to generate complete point clouds that uniformly model both visible and occluded regions. Experimental results demonstrate that the proposed approach significantly outperforms state-of-the-art methods on both scene-level and object-level benchmarks, achieving substantial improvements in reconstruction accuracy and completeness.

Technology Category

Application Category

📝 Abstract
We present NOVA3R, an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner. Unlike pixel-aligned methods that tie geometry to per-ray predictions, our formulation learns a global, view-agnostic scene representation that decouples reconstruction from pixel alignment. This addresses two key limitations in pixel-aligned 3D: (1) it recovers both visible and invisible points with a complete scene representation, and (2) it produces physically plausible geometry with fewer duplicated structures in overlapping regions. To achieve this, we introduce a scene-token mechanism that aggregates information across unposed images and a diffusion-based 3D decoder that reconstructs complete, non-pixel-aligned point clouds. Extensive experiments on both scene-level and object-level datasets demonstrate that NOVA3R outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.
Problem

Research questions and friction points this paper is trying to address.

amodal 3D reconstruction
non-pixel-aligned
unposed images
scene representation
3D geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

non-pixel-aligned
amodal 3D reconstruction
scene-token mechanism
diffusion-based 3D decoder
view-agnostic representation
🔎 Similar Papers
No similar papers found.